|
|
||||||||
Research Reports |
IP Hsueh, OT, MA, is Assistant Professor, School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, Taiwan
WC Wang, PhD, is Professor, Department of Psychology, Chung-Cheng University, Chiayi, Taiwan
CH Wang, PT, BS, is Professor, School of Physical Therapy, College of Medical Technology, Chung-Shan Medical University, and Department of Physical Therapy, Chung-Shan Medical University Rehabilitation Hospital, Taichung, Taiwan
CF Sheu, PhD, is Professor, Institute of Cognitive Science, National Cheng Kung University, Tainan, Taiwan
SK Lo, PhD, is Professor, Faculty of Health and Behavioral Sciences, Deakin University, Melbourne, Australia
JH Lin, PT, PhD, is Professor, Faculty of Physical Therapy, Kaohsiung Medical University, Kaohsiung, Taiwan
CL Hsieh, OT, PhD, is Professor and Chair, School of Occupational Therapy, College of Medicine, National Taiwan University, and Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, 4F, 17 Shiujou Rd, Taipei 100, Taiwan
(mike26{at}ha.mc.ntu.edu.tw) Address all correspondence to Dr Hsieh
Submitted April 16, 2005;
Accepted January 30, 2006
| Abstract |
|---|
.91). The agreement between the subscale scores (Rasch estimates) of the S-STREAM and those of the STREAM was excellent (ICC of
.99, with a lower limit for the 95% confidence interval of
.985), indicating good concurrent validity of the S-STREAM with the STREAM. Discussion and Conclusion. The S-STREAM demonstrates high Rasch reliability, unidimensionality, and concurrent validity with the STREAM in patients with stroke. Furthermore, the S-STREAM is efficient to administer, as it consists of only half the number of items in the original STREAM. Additional studies to examine other psychometric properties (eg, predictive validity and responsiveness) of the S-STREAM or its psychometric properties in various recovery stages after stroke are needed to further establish its utility in both clinical and research settings.
Key Words: Motor function Psychometrics Rasch model Stroke
| Introduction |
|---|
|
|
|---|
The Stroke Rehabilitation Assessment of Movement (STREAM) instrument was designed to provide a comprehensive and quantitative evaluation of voluntary movements (ie, an impairment measurement) and basic mobility (ie, a disability measurement) in patients with stroke.4 The STREAM consists of three 10-item subscales: upper-limb movements, lower-limb movements, and mobility. The psychometric characteristics of the STREAM have been shown by classical test theory to be satisfactory.2–6 The STREAM is preferred over other related impairment or disability measures (eg, the Box and Block Test, the Berg Balance Scale, gait speed testing, the Timed "Up & Go" Test, and the Barthel Index) for monitoring recovery from a stroke at the acute stage, as those measures appeared not to focus on the goals of immediate therapy during this period.5 Furthermore, those measures had limited abilities to discriminate or evaluate patients with stroke because the Box and Block Test, the Berg Balance Scale, gait speed testing, and the Timed "Up & Go" Test showed floor effects in patients with severe stroke, whereas the Barthel Index showed ceiling effects in patients with mild stroke.5,7–9
However, the 3 subscales of the STREAM have never been tested for unidimensionality (one type of construct validity); such testing is required to justify the summation of scores to quantify motor function in patients with stroke. Only items measuring the same, unique dimension (construct) should be retained in a measure. Furthermore, the extremely high internal consistency of the STREAM (ie, the Cronbach alpha value was found to be as high as .98 for each of the subscales)3 indicates possible redundancy among the items. These observations suggest the potential for shortening the STREAM.
Standard Rasch analysis enables the examination of whether items from a scale constitute a unidimensional construct10,11 so as to construct a concise scale.12 However, when an instrument consisting of more than one subscale (eg, the STREAM) is to be calibrated, it is inefficient to apply the standard unidimensional Rasch model separately to each subscale. The unidimensional approach ignores correlations between latent traits (ie, the constructs of the subscales) and thus may yield imprecise measurements of the construct (or characteristic) to be measured, especially when the subscales are short. On the other hand, the multidimensional Rasch model simultaneously calibrates all subscales and therefore uses the correlations to increase measurement precision.13,14 Theoretically, it may be difficult to conceive of constructs that are independent in the movement domains after stroke. Therefore, the multidimensional Rasch model takes into account the between-subscale correlations to increase measurement precision: the higher the correlations, the greater the measurement precision.15,16 In other words, short subscales, if moderately correlated, still can yield precise measurements with the multidimensional approach. Because the 3 subscales of the STREAM are highly correlated with each other,2 the multidimensional approach can be useful in simplifying the STREAM.
To improve administration efficiency, we aimed to shorten the 30-item, 3-subscale STREAM to produce a simplified STREAM (S-STREAM) by using the multidimensional Rasch model. We examined the psychometric properties of the S-STREAM (including Rasch reliability, unidimensionality, and concurrent validity with the STREAM) in subjects with stroke.
| Method |
|---|
|
|
|---|
Procedure
The STREAM, with instructions in Chinese, was administered by the same physical therapist to all of the participants in the 5 rehabilitation departments. The intrarater reliability of data obtained by the physical therapist was satisfactory (intraclass correlation coefficient [ICC] of .94). Demographic characteristics and comorbidity data for the participants were collected from medical records.
Instrument
Items of the STREAM3 for voluntary movements of the limbs are scored on a 3-point scale (0=unable to perform the test movement, 1=able to perform the test movement only partially, and 2=able to complete the test movement). Mobility items are scored on a 4-point scale (0=unable to perform the test movement, 1=able to perform the test movement only partially, 2=able to complete the test movement with a mobility aid, and 3=able to complete the test movement without an aid). Thus, each of the 10-item limb movement subscales was scored out of 20 points, and the 10-item mobility subscale was scored out of 30 points.
Data Analysis
The unidimensionality of the 3 subscales of the STREAM was examined with WINSTEPS.18 The variance-covariance matrix (and the correlation matrix) for the 3 latent traits (ie, the constructs of the 3 subscales of the STREAM) was computed with ConQuest,19 which was developed for the multidimensional random-coefficients multinomial logit model (MRCMLM).13 A brief description of the MRCMLM is given in the Appendix.
To examine the unidimensionality of each subscale, infit and outfit statistics were used to examine whether the data fit the expectation of the Rasch rating scale model (RSM). The infit mean square (MNSQ) is sensitive to unexpected behavior affecting responses to items near the subjects proficiency measure (eg, motor status); the outfit MNSQ is sensitive to unexpected behavior on items far from the subjects motor status. Items with infit or outfit MNSQ values of greater than 1.4 indicate potential misfits.20 The MNSQ can be transformed to a standardized z value (ZSTD) which, for large samples, follows approximately the standard normal distribution when the items fit the expectation of the model. Items with both infit and outfit ZSTD values beyond ±2.58 (twice the tailed area of the normal curve above or below ±2.58 is 0.01) were considered to have poor fit.
In addition, when items fit the expectation of the model, the residuals (observed scores minus expected scores) should be distributed randomly. A principal components analysis was conducted to determine whether any dominant component existed among the residuals. If dominant components were found, then the unidimensionality assumption was violated.
Rasch reliability, which can be viewed as the counterpart of classical test reliability (eg, the Cronbach alpha), was calculated.10,20 Reliability coefficients of greater than .7 were considered good for group comparisons, whereas those greater than .9 were considered good for individual comparisons.21
The appropriateness of the scoring levels in each item of the STREAM was investigated with the RSM. The RSM is useful for polytomous items in a scale that share the same rating scale structure (eg, all items are rated 0, 1, or 2). Estimates of the threshold difficulty between the adjacent scoring levels can be used to examine the appropriateness of the scoring points of a test.20 If disorderings of the step difficulty (ie, the difficulty of a higher step was lower than that of its adjacent lower step) between any 2 adjacent levels were found, then the levels of scaling of the items might be reorganized to achieve suitable scaling.
After the unidimensionality and appropriate scoring levels in each item of the STREAM were established, we attempted to reduce the length of the test further while maintaining its psychometric properties. Each of the 3 subscales of the STREAM was shortened to produce the S-STREAM on the basis of 2 criteria: content representativeness, assessed by a panel of therapists (2 physical therapists and 2 occupational therapists who each had more than 10 years of experience in stroke rehabilitation); and difficulty diversity, that is, even scattering of the difficulties of the selected items over the range of the difficulty continuum.
For each subject, the multidimensional form of the RSM can provide estimates for the 3 subscale scores for both the S-STREAM and the STREAM. We used the RSM estimates for each subscale of the STREAM as the gold standard in this study. Because the Rasch estimates for each subscale have different score ranges, all estimates were linearly transformed to a range of 0 to 100 to facilitate comparisons. The relationship and agreement among corresponding Rasch estimates for subscale scores (ie, the concurrent validity of the S-STREAM with the STREAM) were examined with the Pearson correlation coefficient (r) and the ICC(3,1),22 respectively. Correlation coefficients of greater than .6 indicate acceptable concurrent validity.23
| Results |
|---|
|
|
|---|
|
A multidimensional analysis with ConQuest was performed for the remaining 27 items (ie, 8 items from the upper-limb movement subscale, 9 items from the lower-limb movement subscale, and 10 items from the mobility subscale). Table 2 shows the correlation matrix for the STREAM, which revealed that the underlying latent traits of the subscales of the STREAM were highly correlated, with Pearson coefficients of between .78 and .90. Table 3 shows that the Rasch reliability for the 3 subscales was good (reliability coefficients of
.86).20 Moreover, the 3 subscales of the STREAM showed better reliability when the multidimensional approach was used (reliability coefficients of
.93) than when the unidimensional approach was used (reliability coefficients of
.86).
|
|
|
The threshold difficulty estimates within each subscale were rather far apart (
2.18 logits). In addition, the ordering of the threshold difficulty estimates was not reversed.
Table 3 shows that the use of the multidimensional approach with the S-STREAM resulted in high test reliability (Rasch reliability coefficients of
.91) for the 3 subscales. These results indicate that the 3 subscales of the S-STREAM can yield very precise estimates for individual subjects. When the unidimensional approach was used, the test reliability values were .85, .88, and .94 for the upper-limb movement, lower-limb movement, and mobility subscales of the S-STREAM, respectively.
The agreement between each pair of subscales was excellent (transformed scores of 0–100), with ICCs (95% confidence intervals) of .99 (.993–.995), .99 (.989–.993), and .99 (.985–.990), for the upper-limb movement, lower-limb movement, and mobility subscales, respectively. Furthermore, the Pearson correlation coefficients for the multidimensional Rasch estimates for the STREAM and the S-STREAM were all .99 for the 3 subscales. These results indicate that each subscale of the S-STREAM demonstrates high concurrent validity with the corresponding subscale of the STREAM.
| Discussion |
|---|
|
|
|---|
There are 2 major advantages of using the S-STREAM. First, it is simple and quick to use for patients with stroke compared with the original STREAM. As the S-STREAM contains only half the number of items in the original STREAM, the 15-item S-STREAM can be administered within 10 minutes, that is, half the time required to administer the original STREAM. Rapid assessment is a clinically important feature of this simplified version of the STREAM, as long tests can take a substantial amount of time to complete and may place unreasonable demands upon the respondents, especially in instances in which they may be seriously unwell, as in the case of stroke. Rapid and accurate assessment of functional outcomes in patients with stroke therefore will provide benefits to both clinicians and patients.
A second advantage of using the S-STREAM is that the Rasch estimates for the 3 subscales can be viewed as interval-level measurements.10 In contrast, most measures currently used in the assessment of patients with stroke use ordinal-level measurements. For an ordinal scale, a given difference in scores at one point on the scale does not necessarily represent the same amount of functional change as an identical difference at another point on the scale.24 Interval scores, rather than ordinal scores, can provide a more precise reflection and better resolution of disease impact, differences between individuals and groups, and treatment effects.25 Furthermore, an ordinal scale precludes the use of standard parametric statistical inferences. Because most statistical techniques assume that the data are at least on an interval scale, the Rasch estimates for the S-STREAM are recommended for future applications.
With the multidimensional approach, the between-subscale correlations are taken into account to improve measurement precision. Patients with the same raw upper-extremity scores but with different lower-extremity scores or mobility scores would have different Rasch estimates for their upper-extremity scores. The Rasch estimates for each subscale of the S-STREAM derived from the multidimensional analysis cannot be obtained by summing the raw scores and using a simple Rasch transformation table, as in the unidimensional analysis. Because the transformation table for the multidimensional analysis of the S-STREAM is very long, we have developed a computer program to transform the raw scores for each subscale of the S-STREAM to the Rasch scores. The program is easy to run on common PC platforms. To improve the dissemination of the program and the S-STREAM,26 the related materials can be found at http://ccms.ntu.edu.tw/
clhsieh/s-stream/. Even if some patients do not respond to all items, their Rasch scores still can be estimated and compared because, with the use of the models of the Rasch family (or item response models in general), the estimation of a patients latent traits is based on the patients observed item responses.11
In this study, multidimensional Rasch analysis was shown to be a useful tool for reducing the items of a measure while maintaining the measurement reliability and validity (eg, the Rasch reliability coefficients of the S-STREAM were above the preset criterion of .9, and the subscales of the S-STREAM were highly associated with the corresponding subscales of the STREAM). Furthermore, the multidimensional Rasch model yielded a large number of estimates of a subjects motor function (eg, 191 estimates for the upper-limb movement function in this study) compared with the raw scores of the S-STREAM (eg, 0–10 for the upper-limb movement function). These additional estimates of motor function are likely to promote the psychometric properties (eg, responsiveness and discriminative capacity) of the S-STREAM, although further validation is warranted.
It also should be noted that direct estimation of the correlation among latent traits is possible only for the multidimensional approach and not for the unidimensional one.15,16 Rasch analysis can achieve even more efficient and precise measurements when computerized adaptive testing (CAT)27–29 is used; CAT involves the use of a computer to administer items to respondents and allows respondents levels of function to be estimated as precisely as desired (ie, to reach a preset reliability level). Because the impacts after stroke are multiple and a great deal of time and effort is needed to administer the measures that assess various impacts, it seems promising to combine both the multidimensional approach and CAT to simplify or elaborate on functional measurements in patients with stroke.14
The appropriateness of scoring levels refers to whether or not the motor functions of participants can be differentiated by their responses as clearly as the levels allow.20 Recent studies30,31 have shown that a larger number of scoring points may not lead to a finer differentiation of participants. The items of the subscales of the STREAM are on a 3-point or 4-point ordinal scale, but the appropriateness of scoring levels of the STREAM have rarely been examined. Our study is the first to determine the appropriateness of its scaling in subjects with stroke. We found that the threshold difficulty estimates within each subscale were rather far apart and without disorderings (ie, the ordering of the threshold difficulty of the levels was reasonable). Therefore, the rating scales of the STREAM were supported, indicating that they could differentiate the motor status of subjects very well.
Any measurement tool requires an extensive psychometric examination for the purposes of understanding its particular strengths and limitations.32 Additional studies to examine other psychometric properties (eg, predictive validity and responsiveness) of the S-STREAM are warranted. Furthermore, patients with stroke at the acute or subacute stage receive greater intensity of motor rehabilitation and assessment than do those at the chronic stage. However, more than half of the subjects in this study had had a stroke more than 1 year before the study; therefore, the psychometric properties of the S-STREAM at the acute and subacute stages remain largely unknown. Therefore, further investigations of the psychometric properties of the S-STREAM at various recovery stages after stroke are needed to further establish its utility in both clinical and research settings. Direct psychometric and practical (utility) comparisons between the S-STREAM and other related impairment and disability measures (eg, the Fugl-Meyer Motor Test and the Rivermead Mobility Index) also are needed for prospective users to select a better measure based on empirical data.
| Conclusion |
|---|
|
|
|---|
| Appendix. |
|---|
|
|
|---|
|
| Footnotes |
|---|
The research protocol was approved by local institutional review boards.
This study was supported by research grants from the National Science Council (NSC 93-2314-B-002-033 and NSC 94-2314-B-002-078) and the National Health Research Institute (NHRI-EX94-9204PP).
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |