|
|
||||||||
Research Reports |
RW Wilson, PT, PhD, is Assistant Professor of Physical Therapy, University of South Florida College of Medicine, and Researcher, Health Outcomes and Behavior Program, H. Lee Moffitt Cancer Center and Research Institute
LM Hutson, MS, ARNP, is Patient Care Coordinator, Comprehensive Breast Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Fla
D VanStry, PT, BS, is Physical Therapy and Occupational Therapy Coordinator, H. Lee Moffitt Cancer Center and Research Institute
Address all correspondence to Dr Wilson at School of Physical Therapy, MDC 77, University of South Florida College of Medicine, 12901 Bruce B. Downs Blvd, Tampa, FL 33612-4766 (USA) (rwilson{at}hsc.usf.edu)
Submitted July 12, 2004;
Accepted March 9, 2005
| Abstract |
|---|
Key Words: Breast carcinoma Health status Lymphedema Outcome assessment Quality of life Reproducibility of findings
| Introduction |
|---|
|
|
|---|
Health-related quality of life has been described as the effect that a medical condition or its treatment has on a person.2 Although the concept and measurement of HRQOL remains a controversial subject, most questionnaires operationally define quality of life according to the World Health Organization definition of health3 by assessing at least 3 domains of well-being: physical, emotional, and social.2,4 Health-related quality-of-life assessment tools typically take the form of questionnaires completed by patients and may be classified as either generic or condition-specific. Measures of condition-specific HRQOL are intended for use with people diagnosed with particular diseases, such as diabetes or cancer. Generic questionnaires are designed to be used by anyone, allowing health status comparisons across groups of patients as well as people without disease or impairment.4,5 Measures designed for specific populations are believed to be more relevant and practical to patients because they address prevalent physical findings and symptoms.2,5 Generic health status measures, however, are believed to discriminate among patient groups on important dimensions of overall health and functioning, making them potentially useful as dependent measures in randomized clinical trials.2,5 Because quality of life is a multifaceted phenomenon, HRQOL instruments often have a subscale structure, allowing outcomes to be presented as a profile. These profiles identify specific domains in which patients perform well or fairly well and others in which they perform less well. Most instruments also include summary items that permit the assessment of overall HRQOL.2
There are good reasons to compare quality-of-life scores obtained from women with breast cancer with and without secondary lymphedema using different HRQOL questionnaires. Previous studies618 have examined the consequences of lymphedema and other treatment-related side effects in this population, with conflicting, and sometimes contradictory, results. Comparison of outcomes across different studies is hampered by the fact that a variety of health status questionnaires have been used to assess quality of life. No single instrument has emerged as the criterion measure of quality of life for studies involving patients with breast cancer, and the ability of available HRQOL questionnaires to detect quality-of-life deficits in women with lymphedema secondary to breast cancer is largely unknown.1921 Moreover, some authors have used only summary scores to report overall quality-of-life outcomes for women with breast cancer.13,15 Careful interpretation of the results of these studies is warranted, because it is unclear whether contradictory results are the result of underlying differences between populations, differences in intervention efficacy, or differences in the content and psychometric properties of the quality-of-life instruments used.22
In this study, we compared 2 HRQOL questionnaires, the RAND 36-Item Health Survey (SF-36)4,23 and the Functional Living IndexCancer (FLIC),24 in a sample of women treated for breast cancer. These questionnaires were selected because they are 2 of the most widely used generic (SF-36) and cancer-specific (FLIC) HRQOL instruments available to clinicians and researchers. Both questionnaires have been used as outcome measures in previous rehabilitative oncology studies,13,14,16 and both are available at no cost in the public domain. The primary goals of this article are to help investigators select outcome measures for use in studies of women with breast cancer and to help clinicians interpret results from these studies.
Specific questions addressed by our study were:
| Method |
|---|
|
|
|---|
Quality-of-Life Questionnaires
RAND 36-Item Health Survey (SF-36).
The SF-36 is a generic, 36-item questionnaire that measures 8 health-related domainsphysical functioning, rolephysical (role limitations due to physical impairments), bodily pain, roleemotional (role limitations due to personal or emotional problems), mental health, social functioning, vitality/fatigue, and general healthwith higher scores representing a more favorable health status. We scored the questionnaire using the RAND method, which uses the same items as the Medical Outcomes Study (MOS) SF-36,25 but with simplified algorithms for scoring the bodily pain and general health subscales. Validation studies conducted with data from the MOS have shown that the RAND and MOS scoring methods produce equivalent forms of these 2 subscales (r=.99).4,23 The test manual distributed by the Medical Outcomes Trust also includes instructions for computing 2 health summary scores, the Mental Component Summary score and the Physical Component Summary score, by summing weighted subscale scores. The value of the SF-36 is that it allows comparisons of quality of life experienced by people with breast cancer and adults who are healthy. To do this, raw scores are converted to standardized T-scores (
=50, SD=10) using US general population norms reported in the test manual.26 Accordingly, a score of 50 represents the national average for adults who are healthy, whereas a score of 40 is 1 standard deviation below the national norm.
Functional Living IndexCancer (FLIC).
Also known as the Manitoba Functional Living Cancer Questionnaire, the FLIC is one of the most frequently used quality-of-life instruments designed for use by people diagnosed with cancer.12,22,2730 The questionnaire contains 22 items with 7-point Likert-type linear analog scales. Researchers in previous studies of women with breast cancer have typically reported results as a total score obtained by summing the 22 unweighted item responses (range=22154), with higher scores indicating better health status. However, several psychometric studies have shown that the FLIC measures 5 dimensions of health related to quality of life: physical functioning, mental functioning, social functioning, general health/well-being, and gastrointestinal symptoms.22,29,30 Subscale scores for the FLIC are computed by summing individual items, and maximum scores vary, ranging from 14 to 35 points. Four items included in the total score do not appear to correlate with any of the 5 subscales.30 Because this 5-factor structure has demonstrated stability, both when applied to different populations of people with cancer and when compared with other HRQOL measures that have well-defined subscales, we used it to determine whether similarly named FLIC and SF-36 subscales measure the same quality-of-life dimensions.
Data Analysis
Before the main analysis, descriptive statistics were computed and the results were visually examined for missing cases and normality. Three cases with missing data were deleted from the data set, leaving a total of 110 completed records. Because most subscale scores were negatively skewed and quality-of-life scores generally have not been shown to meet the criteria for interval-level data, nonparametric statistics were selected for the data analyses. Ordinal-level correlation coefficients (Kendall tau-b) were used to assess construct overlap between overall quality-of-life scores (our first research question). Large coefficients between SF-36 and FLIC summary scores would indicate substantial congruence of quality-of-life content between the questionnaires, whereas smaller values would indicate dissimilarities of construct coverage. Differences between groups based on respondent age and length of time since diagnosis of breast cancer were explored using independent t tests (2-tailed,
=.05).
A multi-traitmulti-method correlation (tau-b) matrix was constructed to determine whether similarly named subscales of the 2 questionnaires (methods) assess the same underlying quality-of-life dimensions (traits).31 According to this approach, similar HRQOL dimensions measured with different questionnaires (eg, the physical functioning subscales of the SF-36 and the FLIC) would be expected to correlate highly with each other; that is, they should converge. However, different health dimensions measured with different questionnaires (eg, the physical functioning subscale of the SF-36 and the social functioning subscale of the FLIC) and different dimensions of health measured within the same questionnaire (eg, SF-36 physical functioning and social functioning subscales) would be expected to show low (divergent) correlations with each other.
In this study, convergent relationships were expected within each of the 3 major quality-of-life domains: physical well-being (FLIC physical functioning subscale; SF-36 physical functioning, rolephysical, and bodily pain subscales), emotional well-being (FLIC mental functioning subscale, SF-36 mental health and roleemotional subscales), and social well-being (FLIC social functioning and SF-36 social functioning subscales). We also assessed convergence between the general health subscales of the FLIC and SF-36. Chi-square statistics were used to test the proportion of convergent correlations actually exceeding divergent correlations. For example, the convergent correlation between the physical functioning subscales of the FLIC and SF-36 was compared with all other correlations involving these 2 subscales. This procedure was repeated for the 2 remaining convergent correlations within the physical domain (ie, FLIC physical functioning and SF-36 rolephysical correlation; FLIC physical functioning and SF-36 bodily pain correlation). These 3 convergent correlations were then compared with the remaining 66 divergent correlations involving these subscales. Under the null hypothesis, the proportion of pair-wise comparisons in which convergent correlations exceed divergent correlations is equal to the proportion of comparisons in which divergent correlations exceed convergent correlations. If subscales within a domain show convergent validity, the proportion of convergent correlations exceeding divergent correlations should be high (
2<0.05).22
A contrasted groups approach was used to assess discriminative properties of the SF-36 and FLIC. An instrument is said to possess discriminative validity if a patient group expected to have lower quality of life actually produces lower scores than the group of comparison subjects.32 For this part of the analysis, women treated for breast cancer were assigned to 1 of 2 subgroups based on condition. Participants reporting a previous diagnosis of upper-extremity lymphedema were assigned to the lymphedema group (n=32), and those without lymphedema were assigned to the comparison group (n=78). We expected that quality-of-life scores obtained from women with lymphedema secondary to breast cancer would be lower than scores obtained from women who had breast cancer and did not have lymphedema. Mann-Whitney U tests and effect sizes (difference of group means/standard deviation of the entire sample) were used to evaluate and compare the discriminative powers of both questionnaires across multiple dimensions of health.33 Scores on the FLIC were tested separately from scores on the SF-36 to simulate studies in which either questionnaire was used separately.29 Bonferroni corrections were applied so that the overall false-positive rate for each questionnaire would not exceed 5%. All tests and analyses were performed using SPSS, version 12.1 for Windows.*
| Results |
|---|
|
|
|---|
|
|
|
2=20.48, P<.001). This finding suggests that the developers of the SF-36 and the FLIC viewed physical well-being in similar ways. The relationship between the FLIC physical functioning and the SF-36 rolephysical subscales was particularly noteworthy, exceeding divergent correlations in all 22 pair-wise comparisons (
2=11.0, P<.001) and indicating that these subscales are potentially interchangeable in women with lymphedema secondary to breast cancer.
Similarly, convergent correlations within the domain of mental well-being exceeded divergent correlations in 35 of the 44 possible comparisons (
2=7.68, P<.01). Strong evidence was found to support the assumption that the FLIC mental functioning subscale is interchangeable with the SF-36 mental health subscale (
2=9.09, P<.005) but not the SF-36 roleemotional subscale (
2=0.82, P<.50). Evidence of convergence also was found between the FLIC and SF-36 subscales representing social functioning (
2=4.45, P<.05). Tests for convergence within the general health dimension, however, were not significant (
2=3.27, P<.10), indicating that these similarly named subscales measured somewhat different quality-of-life dimensions in this sample.
Discriminative Validity (Contrasted Groups Approach)
Quality-of-life scores obtained from the total sample and both subgroups of subjects are summarized in Table 4 (FLIC) and Table 5 (SF-36).
|
|
| Discussion and Conclusions |
|---|
|
|
|---|
Several factors may be responsible for these findings, including different concepts of overall health, different scoring methods, and different ways of framing general health questions. The SF-36 general health questions address the patient's health status directly or in comparison with that of other people, whereas the FLIC general health questions ask patients to rate their feelings about their health status. For example, the SF-36 asks respondents to agree or disagree with general health status statements (eg, "My health is excellent," "I am as healthy as anyone I know"), whereas the FLIC asks about health-related moods on the day the questionnaire is completed (eg, "How do you feel today?," "How well do you appear today?"). Individuals who may not feel or look particularly well on a given day but who believe that they are healthy on most days might give themselves low marks when completing the FLIC but might rate their overall health as "excellent" when completing the SF-36.
Findings from our study support a growing body of knowledge demonstrating that the problem of construct divergence between the FLIC and the SF-36 exists across populations with cancer and that this problem extends to other HRQOL inventories as well.22,29,33,34 Kuenstner et al22 compared FLIC and SF-36 subscale scores gathered from a sample of 234 patients with a variety of solid tumors and hematological malignancies and observed patterns of convergence and divergence similar to those found in our study. Specifically, convergent correlations exceeded divergent correlations among subscales assessing physical well-being (41 of 44 comparisons) and mental well-being (44 of 44 comparisons) but not general health status (12 of 24 comparisons). The authors concluded that patients with cancer may associate general health questions from the SF-36 with physical well-being, whereas items included in the general health subscale of the FLIC seem to be associated with subjective feelings related to mental health.22
Controlled assessments of discriminative validity are rare in the field of rehabilitation oncology. To our knowledge, our study is the first to compare generic and condition-specific quality-of-life scores obtained from women with breast cancer with and without secondary lymphedema. Discriminative validity is an essential property for HRQOL questionnaires because failure to discern deficits along important dimensions of well-being may contribute to ceiling effects when these questionnaires are used to assess and compare clinical populations. As expected, FLIC total scores for women with lymphedema were lower than similar scores obtained from the comparison group. Substantial differences in physical well-being also were found when SF-36 Physical Component Summary scores from both groups were compared. However, no differences were found between groups when SF-36 Mental Component Summary scores were examined. When health status profiles developed from each questionnaire were analyzed in detail, it was evident that the discriminative powers of both the SF-36 and the FLIC varied by subscale. The greatest differences between groups (effect sizes=0.50.75) were found within the domain of physical well-being for both instruments. In contrast, no differences were found between groups using the mental health subscale of the SF-36 (effect size=0.35), the FLIC social functioning subscale (effect size=0.44), or the FLIC gastrointestinal symptom subscale (effect size=0.1).
When relatively large samples are studied, it is possible for small differences in HRQOL scores to be statistically significant yet clinically unimportant. The concept of minimally clinically important difference (MCID) indicates the smallest difference in a score that is believed to be important. Although absolute thresholds for MCID are difficult to define, the literature on the SF-36 generally shows effect sizes for MCID lying in the range of 0.3 to 0.5.4 Statistically significant differences in SF-36 scores collected from women with breast cancer with and without lymphedema during our study consistently fell within these limits, suggesting that those HRQOL differences also may be clinically meaningful. To the best of our knowledge, MCIDs have not been published for the FLIC.
Our findings support previous studies that have shown that women with lymphedema appear to experience greater physical challenges than other women with breast cancer of similar age and length of survival. Kwan et al11 reported that women with lymphedema scored lower than other women with breast cancer on the SF-36 physical functioning subscale but found no differences with regard to social functioning or mental health. Using another multidimensional quality-of-life questionnaire, the Functional Assessment of Cancer Therapy, Beaulac et al6 observed lower overall quality of life and physical functioning but not mental or social well-being in patients with lymphedema. These findings support the belief that quality-of-life questionnaires tend to be most sensitive to differences between patient groups along the physical dimension of health and functioning.2,5
Results from our study also suggest that the SF-36 possesses relatively weak discriminative power with regard to emotional well-being, failing to demonstrate mental health differences between women with lymphedema secondary to breast cancer and women with breast cancer without lymphedema that were detected using the FLIC. This finding is consistent with the results of Broeckel and colleagues,7 who compared SF-36 scores from patients with breast cancer treated with adjuvant chemotherapy with those from a group of age-matched comparison subjects with no history of cancer. In that study, depressive symptomatology reported by the patients who received chemotherapy using the Center for Epidemiologic Studies Depression Scale was greater than that reported by the comparison subjects. As in our study, large differences in physical functioning were observed between groups, but no group differences were found when SF-36 Mental Component Summary or mental health subscale scores were compared. The authors noted that the SF-36 contains only 2 items addressing anxiety and 3 items assessing depression and suggested that this instrument's coverage of mental health symptoms may be too limited to be able to detect the types of emotional problems experienced by patients with breast cancer. These findings imply that condition-specific measures such as the FLIC may be more sensitive to emotional issues that people with breast cancer and lymphedema consider important than generic HRQOL measures.
This view is supported by previous reports of concurrent validity between the FLIC and commonly used indexes of depression, anxiety, and other cancer-related symptoms.30 Other investigators7 have suggested that HRQOL assessment of patients with cancer may be further improved by using additional measures assessing specific symptoms or emotional states (eg, depression, anxiety) to augment HRQOL questionnaires. Alternatively, researchers may choose to use a combination of generic and condition-specific health status measures to assess the impact of breast cancer and treatment-related side effects on quality of life. An advantage of these comprehensive outcome assessment strategies is that they have the potential to capitalize on the strengths of various instruments while minimizing their limitations.5 Incorporating and comparing several measures of HRQOL also may contribute to the accurate measurement and understanding of the effects of cancer-related morbidities on perceived health status and life satisfaction.35
A secondary finding of our study was that women with secondary lymphedema averaged 1 standard deviation below the norms for the US population with regard to physical well-being.36 This classification is similar to people diagnosed with chronic lung disease, arthritis, diabetes, or symptomatic human immunodeficiency virus.26,36 These results are consistent with 1 of 2 previous studies comparing SF-36 subscale scores from people with lymphedema secondary to breast cancer with national norms. As in our study, Pain and colleagues14 reported that a substantial portion of subjects in their sample (n=48) were well below US general population values for physical functioning but not for mental health. In contrast, Velanovich and Szymanski16 found that a small sample of women with secondary lymphedema (N=11) scored well below national norms for SF-36 subscales measuring mental health but not those measuring physical functioning.
Reasons for these somewhat contradictory results are unclear, but they may be related to differences in subject characteristics, selection procedures, or psychometric properties of the SF-36. The SF-36 has demonstrated an interesting age bias with regard to emotional well-being in women with breast cancer, with older women reporting better mental health outcomes than younger ones.7,12 The patients with lymphedema in the study by Velanovich and Szymanski,16 however, were somewhat older than those in our sample (59.1 versus 50.6 years), so the direction of any instrumental age bias should have resulted in reports of emotional well-being that were better rather than worse for adults who were healthy. A more likely explanation for these conflicting results may be related to longitudinal improvements in emotional well-being following treatment for breast cancer. In our study, length of survival for women with lymphedema averaged about 2.6 years, ranging from 3 months to 9 years. Health-related quality-of-life scores reported from the study by Velanovich and Szymanski were obtained 6 to 48 months following surgery, but group statistics regarding average length of survival were not provided. Because many of our subjects appear to have had longer periods of time to cope with and adapt to the psychosocial consequences of lymphedema, it seems reasonable to expect that their emotional well-being scores would be higher. A similar coping/adaptation phenomenon also might have contributed to the positive age bias observed in SF-36 mental health scores in previous studies of women with breast cancer.35
In contrast, participants in the comparison group in our study reported remarkably good health when compared with adults who were healthy participating in the Medical Outcomes Study.36 This finding is consistent with results from several short-term (25 years) follow-up studies of people with breast cancer.9,12,28,35 Ganz and associates9 studied 864 women surveyed 1 to 5 years following diagnosis of early-stage breast cancer. Profiles on the SF-36 were at or above age-matched population norms for women who were healthy and substantially higher than those of outpatients with other chronic medical conditions. Similar findings have been reported by other investigators comparing quality of life in people with breast cancer with population norms.12,28,35 Total scores on the FLIC obtained from our study also are within the range (124136) reported from previous studies of women with breast cancer.12,28,29
Because the main objective of our study was to compare the measurement properties of the SF-36 and the FLIC, sociodemographic variables (eg, income, ethnicity/cultural preference, marital status) and clinical data (eg, disease stage, surgical treatment, types of adjuvant therapy) that might influence HRQOL were not collected. Consequently, we cannot comment on the effect that these potentially confounding factors may have had on the physical, psychological, or social dimensions of health reported by the participants in this study. We realize that this omission limits the generalizability of the findings. This weakness, however, should not detract from our findings concerning the measurement properties of the SF-36 and FLIC questionnaires.
For users of HRQOL questionnaires, particularly the FLIC and SF-36, the following implications can be drawn from our findings. When selecting an HRQOL instrument, investigators should not rely simply on the names of the questionnaires and their subscales or domains but also should take into account the content and structure of the individual items. Although subscales of the SF-36 and the FLIC addressing physical, mental, and social well-being appear to measure similar dimensions of health, these subscales and overall HRQOL scores (ie, FLIC total score, SF-36 Physical Component Summary score, and SF-36 Mental Component Summary score) are not interchangeable. In addition, although both the SF-36 and the FLIC appear to be useful for measuring physical functioning in patients with breast cancer, the condition-specific FLIC may be more sensitive to psychological factors influencing the health and well-being of people with breast cancer and secondary lymphedema than the generic SF-36.
Clinicians and researchers should note that the same caveat applies to the interpretation and comparison of study results. When a particular research question has been investigated in different studies using either the SF-36 or the FLIC, readers should recognize that inconsistent findings may be the result of construct divergence between the 2 assessment instruments rather than of true differences in treatment effectiveness or actual quality-of-life outcomes.
In our study, we compared quality-of-life scores obtained using 2 questionnaires: the SF-36 and the FLIC. When only 2 instruments are compared, it is unclear whether low correspondence between similarly labeled subscales, as in the case of the general health subscales of the FLIC and SF-36, is the result of poor statistical properties of one instrument, whether the 2 questionnaires stress different aspects of one domain, or whether a quality-of-life dimension is more difficult to assess than other dimensions.22 Future studies using a third questionnaire may provide additional insight into the likely causes of weak convergent validity observed in our study.
Finally, we should note that instrument validity is contextual and that its evaluation is an iterative process. Individual studies of convergent and discriminative properties provide important evidence but not absolute proof of construct validity.37 That evidence is most relevant when gathered from the populations and settings in which the instrument is intended to be used. Further research comparing various HRQOL instruments is needed to identify meaningful differences among questionnaires and to evaluate their usefulness and reliability within specific subpopulations of people with cancer.
| Footnotes |
|---|
This study was approved by the H. Lee Moffitt Cancer Center and Research Institute's Scientific Review Committee and the Institutional Review Board (Biomedical) of the University of South Florida.
This study was funded in part by the American Cancer Society (#93-032-07 to Dr Wilson) through the University of South Florida/H. Lee Moffitt Cancer Center Institutional Research Grant program.
* SPSS Inc, 233 S Wacker Dr, Chicago, IL 60606-6307. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. M. Hormes, L. A. Lytle, C. R. Gross, R. L. Ahmed, A. B. Troxel, and K. H. Schmitz The Body Image and Relationships Scale: Development and Validation of a Measure of Body Image in Female Breast Cancer Survivors J. Clin. Oncol., March 10, 2008; 26(8): 1269 - 1274. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |