|
|
||||||||
Research Reports |
TS Roddey, PT, PhD, OCS, FAAOMPT, is Associate Professor, School of Physical Therapy, Texas Woman's University, 1130 MD Anderson Blvd, Houston, TX 77030 (USA) (hg_roddey{at}twu.edu). She was a doctoral student, School of Physical Therapy, Texas Woman's University, and Research Fellow, Texas Orthopedic Hospital, Houston, Tex, at the time this research was conducted. Address all correspondence to Dr Roddey
SL Olson, PT, PhD, is Associate Professor, School of Physical Therapy, Texas Woman's University
KF Cook, PhD, is Psychometrician, Center for Healthy Aging (a VA Rehab R&D Center of Excellence), Houston, Tex, and Assistant Professor, Baylor College of Medicine, Houston, Tex
GM Gartsman, MD, is Orthopedic Surgeon, Fondren Orthopedic Group, Houston, Tex
W Hanten, PT, EdD, is Professor, School of Physical Therapy, Texas Woman's University
Dr Roddey, Dr Olson, and Dr Cook provided concept/research design, writing, and data analysis. Dr Roddey and Dr Gartsman provided data collection, Dr Roddey provided project management, and Dr Gartsman provided subjects. Dr Olson, Dr Cook, and Dr Hanten provided consultation (including review of manuscript before submission). The authors acknowledge the Rehabilitation Department of Texas Orthopedic Hospital for their support and the assistance of the employees of the Fondren Orthopedic Group with this project
Submitted July 19, 1999;
Accepted April 24, 2000
| Abstract |
|---|
Key Words: Outcome assessment (health care) Psychometrics Shoulder Shoulder joint
| Introduction |
|---|
|
|
|---|
A number of shoulder scales have been developed. Three of the most commonly used shoulder outcome scales are the University of CaliforniaLos Angeles (UCLA) Shoulder Scale,5 the Simple Shoulder Test (SST),6 and the Shoulder Pain and Disability Index (SPADI).7 There is little published evidence regarding the psychometric properties of these measures. Only one shoulder scale, the SPADI, has undergone psychometric scrutiny with more than one sample of subjects.711 We found no studies in which reliability was measured for the SST, and in only one study was validity measured for the SST.9 Despite its frequent use in research, no studies have evaluated either the reliability or the validity of measurements obtained with the UCLA Shoulder Scale.
The test-retest reliability of measurements obtained with the SPADI has been assessed in 2 studies.7,11 Both groups of investigators found the SPADI measurements to be what we would consider reliable (intraclass correlation coefficients [ICCs]=.66.91), thus supporting its use for group-level comparisons. Shoulder scales are often used, however, not only in research but in clinical settings to document the status of patients. In comparison with the reliability required for group-level comparisons, we contend that much greater reliability is needed to justify a scale's use in individual assessment.12 Only Beaton and Richards11 have addressed the issue of the individual-level reliability of measurements obtained with the SPADI, and they reported that an ICC of .91 was adequate. The reliability of SPADI measurements was further evaluated by Roach et al7 through calculation of internal consistency values, with Cronbach alpha (
) values ranging from .86 to .95.
In several studies,710 the SPADI measurements were evaluated for construct and criterion validity. Roach et al7 used factor analysis to evaluate the construct validity of SPADI measurements in a sample of 37 male veterans. They found minimal support for the division of the SPADI items into subscales measuring 2 dimensions (pain and disability). Because their sample was quite homogeneous (male veterans) and grossly undersized for a factor analysis,13 their results should be considered preliminary. In the same study,7 criterion validity was evaluated using active range of motion (AROM) of the shoulder as a "gold standard." As Heald et al10 have noted, there is no evidence to support the use of shoulder AROM to reflect function. Beaton and Richards9 found that AROM measurements obtained during shoulder elevation correlated poorly with scores on both the SPADI and the SST.
Researchers in 3 studies810 examined the construct validity of SPADI measurements by comparing subjects' scores with those obtained from general health-related quality-of-life scales. The researchers in one of these studies9 also included the SST in their examination of construct validity. Williams et al8 compared the SPADI with the Medical Outcomes Study 20-Item Short-Form Health Survey (SF-20). The moderate correlation with the physical function and pain components of the SF-20 supported the construct validity of the SPADI measurements. Two groups of researchers9,10 examined the construct validity of measurements obtained with shoulder outcome scales. Heald et al10 compared SPADI scores with scores on the Sickness Impact Profile (SIP). Beaton and Richards9 compared SPADI and SST scores with scores on the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36). The correlations found between these shoulder-specific measures and general health status measures demonstrated moderate associations.
In summary, of the 3 commonly used shoulder outcome scalesthe SPADI, SST, and UCLAonly the SPADI has undergone psychometric scrutiny. Despite the methodological problems in some of these studies, we believe that, overall, they support a conclusion that there is acceptable reliability and validity of data obtained with the SPADI to support its use with group investigations. The UCLA Shoulder Scale and the SST, however, have little or no psychometric evidence of reliability at either the group or the individual level and little support for their use in either clinical evaluation or research regarding shoulder function. Therefore, we argue that the SPADI is the existing "gold standard" of shoulder outcome measures. The purposes of our study were (1) to add to the existing work on the reliability and validity of SPADI measurements and (2) to explore the psychometric properties of other 2 shoulder outcome measures, the UCLA and the SST. Specifically, the aims of our study were:
| Materials and Methods |
|---|
|
|
|---|
Demographics and surgical history for the sample are reported in Table 1. Fifty-eight percent of the subjects were male (112/192), and 41% (78/192) were female. Two subjects did not report their sex. The average age of the subjects was 47 years (SD=15, range=1887). Forty-six percent of the subjects (88/192) reported that they had undergone surgery on their shoulder. The median time since surgery was 14 weeks (
=30.1, SD=76.5, range=1.5572). Fifty-three percent of the subjects (102/192) reported that they had not had surgery on their shoulder; 2 subjects did not answer that question.
|
SST.
The SST is a shoulder function scale consisting of 12 items that ask people about their ability to tolerate or perform 12 activities of daily living (ADL). The individual indicates that he or she is able or is not able to do the activity. The SST scores range from 0 to 100 and are reported as the percentage of answered items to which the person responds in the affirmative.6
SPADI.
The SPADI consists of 2 self-report subscales of pain and disability. The items of both subscales are visual analog scales (VASs). The 5-item pain subscale asks people about their pain during ADL, and each item is anchored by the descriptors "no pain" (left anchor) and "worst pain imaginable" (right anchor). The 8 disability items ask people about their difficulty in performing ADL. These items are anchored with the descriptors "no difficulty" (left anchor) and "so difficult it required help" (right anchor). Each item is scored by measuring the distance from the left anchor to the mark made by the person. Subscales are scored in a 3-part process. First, item scores within the subscale are summed. Second, this sum is divided by the summed distance possible across all items of the subscale to which the person responded. Third, this ratio is multiplied by 100 to obtain a percentage. Higher scores on the subscale indicate greater pain and greater disability. To obtain the SPADI total score, the pain and disability subscales scores are averaged.7
Data Analysis
Reliability.
The internal consistency of the multi-item subscales was assessed by calculating Cronbach alpha values for the subscales (SPADI pain subscale, SPADI disability subscale, and SST). The Cronbach alpha statistic is an estimate of the reliability of a scale's measurements calculated from a single administration of the scale.14 Therefore, in contrast to test-retest statistics, this method is particularly useful for evaluating scales measuring traits that can change over short periods of time. This statistic was not calculated for the UCLA pain or function subscales because these 2 subscales consist of single items.
The SEM is a statistic that is used to estimate how reliably a scale estimates an individual's "true score," that is, the score that would be obtained for the person if the scale measured perfectly, without error.12,15 This statistic contrasts with other statistics, such as the Cronbach alpha, that evaluate how reliably a collection of scores can estimate the true mean score of a group of individuals, that is, the mean score that the group would have if the scale measured without error.
A useful property of the SEM statistic is that it can be used to establish a range of "confidence" around an individual score (eg, the score interval within which the examiner could be 95% confident that the individual's true score existed. The SEM estimates were calculated using the following equation:
|
|
Validity.
To assess the construct validity of the SPADI and SST scores, 2 principal-component factor analyses with varimax rotation were conducted. If the SPADI is measuring pain and disability as separate entities, we would expect a 2-factor solution, with disability items grouping on one factor and pain items grouping on the other factor. The SST is designed to measure a single dimension; therefore, its items are expected to load on a single dimension.
Two other kinds of validity were assessed: convergent validity and discriminant validity. Convergent validity estimates evaluate the degree to which 2 scales measure the same trait.19 Discriminant validity estimates evaluate the degree to which 2 scales measure different traits.19 In the present case, convergent and discriminant validity support the use of scale score to represent the status of patients with regard to their shoulder pain or shoulder function. Strong correlations between subscales purporting to measure the same construct are thought to be evidence of convergent validity. Weak correlations between subscales purporting to measure different constructs are, using the same logic, viewed as evidence of discriminant validity.
Convergent validity and discriminant validity were examined by calculating correlation coefficients between the scores of the SPADI and those of the SST and UCLA. To evaluate the convergent validity of the shoulder function and disability subscales, Spearman correlation coefficients (rs) were calculated between the scores of the SPADI disability subscale and the scores of the UCLA function subscale and the SST. The Spearman correlation coefficient between the SPADI and UCLA pain subscales was also calculated. Discriminant validity was estimated by calculating the Spearman correlation coefficient between pairs of pain and function or disability subscales.
| Results |
|---|
|
|
|---|
Standard errors of measurement.
The calculated SEMs and 95% CIs for the SPADI pain and disability subscales and for the SPADI and SST scales as a whole are presented in Table 2. The SPADI scale demonstrated the most precision, with a 95% CI of ±9.3 points. The SST demonstrated the poorest precision, with a 95% CI of ±22.8 points. The SPADI pain and disability subscales had 95% CIs of ±15.3 and ±11.3 points, respectively.
|
|
Table 4 shows the loadings of each SST item on the first and second factors. After varimax rotation, the item with the strongest association with the first factor and weakest association with the second factor was the item "Can you wash the back of your opposite shoulder with the affected extremity?" (factor 1=.775, factor 2=.037). The item "Does your shoulder allow you to sleep comfortably?" had the strongest association with the second factor and a very weak association with the first factor (factor 1=.023, factor 2=.764). The 2 items most strongly associated with the second factor (items 1 and 2) were those that might be expected to be strongly influenced by the amount of pain a person is experiencing. These items query respondents regarding their comfort in sleeping and in resting.
|
|
|
| Discussion |
|---|
|
|
|---|
The SST had lower internal consistency than either the SPADI pain and disability subscales or the SPADI total that combines items of the 2 subscales. It is somewhat curious that the largest Cronbach value was obtained for the SPADI total (
=.96). When 2 subscales representing different constructs are combined, lower internal consistency would be expected.14 When we calculated in a single analysis the internal consistency of the pain and disability items of the SPADI, however, an even higher internal consistency value was obtained (
=.96). One possible explanation for the higher value is that an increase in reliability is expected with a greater number of items. Another explanation is that the SPADI is essentially unidimensional. The latter possibility is consistent with 2 other findings in the study: (1) the principal-component factor analysis resulted in a single-factor solution, and (2) the correlational analysis revealed a high association between the SPADI subscales.
Standard errors of measurement.
Although the SEM values for the SST were the largest of the scales and subscales analyzed, the SEMs for the SPADI total and the SPADI disability and pain subscales also were quite large. These results demonstrate that a scale can meet generally accepted standards with regard to group-level reliability yet not have adequate precision for the measurement of individuals. Therefore, a scale that functions quite well in comparing group means in a research study may be inadequate for tracking changes in individual patients across the course of their rehabilitation. These results concur with those found by Yarnitsky and colleagues20 when they evaluated the repeatability of VAS measurements of pain. They found that, although much research supports the reliability of VAS pain measurements, the reliability of measurements obtained with such scales for individuals warrants further investigation. In addition, with continued pressure by third-party payers to document treatment efficacy, utilization of a scale that has substantial measurement error can result in misleading information about a patient's progress.
Validity
Construct validity.
Roach et al7 conducted a principal-components factor analysis with varimax rotation of the SPADI pain and disability subscale items and found that they loaded on 2 factors. These 2 factors, however, did not delineate clearly between the pain and disability items. As already noted, the sample size (N=37) for the study by Roach and colleagues was insufficient for a principal-components factor analysis. In addition, their study sample was very homogeneous and failed to include nonveterans and women. In our larger and more heterogeneous sample, a single factor best accounted for the variance in the SPADI item scores. All of the SPADI items had at least moderately high correlations with the first factor (.613.905), and there was little difference in magnitude between the factor loadings of the disability items and those of the pain items. These results suggest that, in their responses to the SPADI items, people do not distinguish between pain and disability. One possible explanation for this finding is the wording of the SPADI items. The disability items ask respondents to indicate the amount of difficulty they have with specified functions. It is possible that, when people report their difficulty in performing an activity, they consider pain to be part of what makes the activity difficult.
The results of the principal-components factor analysis of the SST provide an interesting contrast with those obtained for the SPADI. The SST purports to measure a single construct. Our results, however, support a 2-factor solution for the SST. Two items had a strong association with a second factor and a weak association with the first factor: "Is your shoulder comfortable with your arm at rest by your side?" and "Does your shoulder allow you to sleep comfortably?" In contrast to these 2 items, the other items of the SST, which loaded heavily on the first factor, query individuals regarding challenges to their strength or flexibility with the involved shoulder. Therefore, the first factor measures what a person can do with his or her shoulder; the second factor measures a person's comfort with the shoulder at rest.
Convergent validity.
To assess convergent validity, pain subscale scores were correlated with the SPADI pain subscale scores and function subscale scores were correlated with the SPADI disability subscale scores. The correlation between the SPADI and UCLA pain subscales was only moderate (rs=.63). This was also the case for pairs of function and disability scales. The strongest correlation was that between the SPADI disability subscale scores and the SST scores (rs=.80). Weaker correlations were found between the SPADI disability subscale scores and the UCLA function subscale scores (rs=.64). The correlation between the UCLA function subscale scores and the SST scores was .60. Generally, we believe there are 2 explanations for a lack of strong association between pairs of subscales developed to measure the same construct. These correlations will be low if (1) either or both subscales of a pair produce a large amount of measurement error or (2) the subscales actually measure different constructs. The relatively high internal consistency values obtained for the subscales in this study indicate that measurement error alone was not the sole cause of the moderate correlations.
Discriminant validity.
The strength of the obtained correlations between the SPADI pain and disability subscale scores and between the UCLA pain and function subscale scores does not support the discriminant validity of these scales. The correlation analyses suggest greater convergence than divergence between the subscales of these measures. This finding is exemplified by the fact that the association between the SPADI pain and disability subscale scores was stronger than that between the SPADI pain subscale scores and the UCLA pain subscale scores or between the SPADI disability subscale scores and the scores of either of the function subscales (SST or UCLA function subscale). Our results for the SPADI are similar to those obtained by Roach and Colleagues.7
The findings of our study support the following general assertions:
Limitations and Future Research
A limitation of this study was that the sample was obtained from the practice of a single orthopedic surgeon. Although the sample was similar to the general population of patients with shoulder disorders with respect to mean age and sex composition,21 it may contrast on other important variables. For example, information about ethnicity, shoulder diagnosis, and severity or acuteness of the impairment were not collected on the study population, nor did we account for socioeconomic status or vocation. It is possible, therefore, that our sample contrasted with the general population on these variables.
We used a classical approach for obtaining estimates of the SEMs.14 There are a number of drawbacks to such an approach.22 One drawback with substantial implications is that this is a "one-estimate-fits-all" method. It provides a single value to represent the precision of a scale on average across the whole range of possible scores. Scales vary, however, with regard to how precisely they measure people at different levels of the trait being measured.23 As an example, it is possible, indeed likely, in our view, that a scale of shoulder function will measure with unequal precision individuals who have low, medium, and high levels of shoulder function. The classically obtained SEM estimate, because it is an average, underestimates how precisely a scale measures at some trait levels and overestimates its precision at other trait levels.
Another limitation of our study was the use of the SPADI as the "gold standard" to which the other 2 scales were compared in establishing validity. Although the SPADI has certainly undergone more psychometric testing than other shoulder scales, it has been evaluated in fairly homogeneous samples, primarily among male participants.7,8 Future studies should examine the relationship between scores on shoulder outcome scales and external clinical criteria.
| Conclusions |
|---|
|
|
|---|
| Appendix |
|---|
|
|
|---|
|
| Footnotes |
|---|
This study was supported, in part, by a doctoral research fellowship awarded to Ms Roddey by Texas Orthopedic Hospital and Columbia/HCA.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. R. Lynch, A. K. Franta, W. H. Montgomery Jr, T. R. Lenters, D. Mounce, and F. A. Matsen III Self-Assessed Outcome at Two to Four Years After Shoulder Hemiarthroplasty with Concentric Glenoid Reaming J. Bone Joint Surg. Am., June 1, 2007; 89(6): 1284 - 1292. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F Beattie, R. M Nelson, and A. Lis Spanish-Language Version of the MedRisk Instrument for Measuring Patient Satisfaction With Physical Therapy Care (MRPS): Preliminary Validation Physical Therapy, June 1, 2007; 87(6): 793 - 800. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Angst, J. Goldhahn, G. Pap, A. F. Mannion, K. E. Roach, D. Siebertz, S. Drerup, H. K. Schwyzer, and B. R. Simmen Cross-cultural adaptation, reliability and validity of the German Shoulder Pain and Disability Index (SPADI) Rheumatology, January 1, 2007; 46(1): 87 - 92. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Cacchio, M. Paoloni, A. Barile, R. Don, F. de Paulis, V. Calvisi, A. Ranavolo, M. Frascarelli, V. Santilli, and G. Spacca Effectiveness of Radial Shock-Wave Therapy for Calcific Tendinitis of the Shoulder: Single-Blind, Randomized Clinical Study Physical Therapy, May 1, 2006; 86(5): 672 - 682. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Smith, S. A. Barnes, J. W. Sperling, C. M. Farrell, J. D. Cummings, and R. H. Cofield Patient and Physician-Assessed Shoulder Function After Arthroplasty J. Bone Joint Surg. Am., March 1, 2006; 88(3): 508 - 513. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Kocher, M. P. Horan, K. K. Briggs, T. R. Richardson, J. O'Holleran, and R. J. Hawkins Reliability, Validity, and Responsiveness of the American Shoulder and Elbow Surgeons Subjective Shoulder Scale in Patients with Shoulder Instability, Rotator Cuff Disease, and Glenohumeral Arthritis J. Bone Joint Surg. Am., September 1, 2005; 87(9): 2006 - 2011. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Smith, J. W. Sperling, and R. H. Cofield Rotator Cuff Repair in Patients with Rheumatoid Arthritis J. Bone Joint Surg. Am., August 1, 2005; 87(8): 1782 - 1787. [Abstract] [Full Text] [PDF] |
||||
![]() |
A Paul, M Lewis, M F Shadforth, P R Croft, D A W M van der Windt, and E M Hay A comparison of four shoulder-specific questionnaires in primary care Ann Rheum Dis, October 1, 2004; 63(10): 1293 - 1299. [Abstract] [Full Text] [PDF] |
||||
![]() |
S D M Bot, C B Terwee, D A W M van der Windt, L M Bouter, J Dekker, and H C W de Vet Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature Ann Rheum Dis, April 1, 2004; 63(4): 335 - 341. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. V. Fehringer, B. Kopjar, R. S. Boorman, R. S. Churchill, K. L. Smith, and F. A. Matsen III Characterizing the Functional Improvement After Total Shoulder Arthroplasty for Osteoarthritis J. Bone Joint Surg. Am., August 12, 2002; 84(8): 1349 - 1353. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F Beattie, M. B. Pinto, M. K Nelson, and R. Nelson Patient Satisfaction With Outpatient Physical Therapy: Instrument Validation Physical Therapy, June 1, 2002; 82(6): 557 - 565. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |