PTJ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


PHYS THER
Vol. 79, No. 1, January 1999, pp. 8-23

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Daley, K.
Right arrow Articles by Wood-Dauphinée, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Daley, K.
Right arrow Articles by Wood-Dauphinée, S.

Research Reports

Reliability of Scores on the Stroke Rehabilitation Assessment of Movement (STREAM) Measure

Kathy Daley, Nancy Mayo and Sharon Wood-Dauphinée

K Daley, MSc(Rehabilitation Science), BSc(PT), was a graduate student in rehabilitation science, McGill University, Montreal, Quebec, Canada, when this study was completed
N Mayo, PhD(Epidemiology), MSc(Applied), BSc(PT), is Research Scientist, Royal Victoria Hospital, Montreal, Quebec, Canada, and Assistant Professor, Faculty of Medicine, Department of Epidemiology, and School of Physical and Occupational Therapy, McGill University. Address all correspondence to Dr Mayo at The STREAM Research Group, School of Physical and Occupational Therapy, McGill University, Davis House, 3654 Drummond St, Montreal, Quebec, Canada H3G1Y5
S Wood-Dauphinée, PhD(Epidemiology), MSc(Applied), BSc(PT), is Professor and Director, School of Physical and Occupational Therapy, and Professor, Department of Medicine and Department of Epidemiology and Biostatistics, McGill University


Submitted September 12, 1997; Accepted May 22, 1998


    Abstract
 
Background and Purpose. The Stroke Rehabilitation Assessment of Movement (STREAM) is a new clinical measurement tool for evaluating the recovery of voluntary movement and basic mobility following stroke. This article presents the results of 3 substudies examining the reliability (interrater and intrarater) and internal consistency of STREAM scores. Subjects and Methods. A "direct-observation reliability study" was conducted on 20 patients who had strokes and were in a rehabilitation setting. Pairs of raters from a group of 6 participating therapists provided data to judge interrater agreement. A "videotaped assessments reliability study" was done to assess intrarater and interrater agreement on the scoring of videotaped performances using the STREAM measure and involved 4 videotaped assessments that were viewed and rated on 2 occasions by 20 physical therapists. The internal consistency of the STREAM scores was evaluated for 26 patients who had strokes and who demonstrated the full range of motor ability. Results. The reliability of the STREAM scores was demonstrated by generalizability correlation coefficients of .99 for total scores and of .96 to .99 for subscale scores. The internal consistency of the STREAM scores was demonstrated by Cronbach alphas of greater than .98 on the subscales and overall. Conclusion and Discussion. These high levels of reliability support the use of the STREAM instrument for the measurement of motor recovery following stroke. Further work on the validity and responsiveness of the STREAM measure is in progress.

Key Words: Cerebrovascular accident • Motor recovery • Outcome measure • Reliability


    Introduction
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 
Although a number of instruments are available to measure the recovery of movement following stroke,110 we believe that they have not been widely used in clinical practice. A Canadian survey*2 revealed that, although several of these outcome measures were being used for research purposes,1116 they were being used routinely in less than 5% of physical therapy departments. Lengthy administration time, complexity of scoring, and dependence on equipment were cited as barriers to routine use in clinical settings.


    Background of the STREAM Measure
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 
Researchers and clinicians at the Jewish Rehabilitation Hospital (JRH), Laval, Quebec, Canada, collaborated to produce the original Stroke Rehabilitation Assessment of Movement (STREAM) measure in 1986. The instrument was designed for use in physical therapy departments to provide a comprehensive, objective, and quantitative evaluation of the motor functioning of individuals with stroke. The instrument was designed to be quick and simple to administer, and it fit very well into the routine clinical assessment scheme. In the first phase of the instrument's development, the items and scoring of the original STREAM measure were subjected to review by 2 panels of experts consisting of a total of 20 physical therapists. Based on the panels' recommendations, the original instrument was refined. This intermediate "test version" of the STREAM measure then underwent preliminary evaluations for the reliability and internal consistency of scores. Item reduction was then carried out.

The final version of the STREAM measure consists of 30 items or test movements that are equally distributed among 3 subscales: upper-limb movements, lower-limb movements, and basic mobility items. The STREAM scoring form, including the criteria for scoring the items, is presented in the Appendix. Limb movements are scored on a 3-point scale. Mobility items are scored on a 4-point scale similar to that used for scoring limb movements except that a category has been added to allow for independence with the help of a mobility aid. Thus, the maximum raw total STREAM score is 70, with each of the limb subscales scored out of 20 points and the mobility subscale scored out of 30 points. To allow for the possibility that occasionally an item cannot be scored (eg, because of restricted range of motion or pain), the subscale and total scores may be transformed to scores out of 100. The procedure for transforming the scores is provided in the STREAM test manual.{dagger}1 Further details of the development and validation of the content of the STREAM measure are presented elsewhere.17 The present article presents the results of reliability studies carried out on the STREAM measure.

The objectives of this study were (1) to determine the extent to which pairs of therapists concur on scores for the items of the STREAM measure, on subscale scores, and on the total score (interrater reliability); (2) to assess the consistency of STREAM scores based on the observation of videotaped performances across occasions and across raters (intrarater and interrater reliability); and (3) to examine the extent to which the items of the STREAM measure relate to each other, to their respective subscales, and to the group of items as a whole (internal consistency).


    Method
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 
Overview of Study Design

Three separate substudies were conducted. The first objective was achieved through the "direct-observation reliability study," involving pairs of raters who directly observed and assessed patients. To realize the second objective, the "videotaped assessments reliability study" was done with raters viewing and rating videotaped assessments on 2 occasions. To accomplish the third objective, internal consistency was examined using data accrued in the direct-observation reliability study, as well as additional data collected on patients scoring in the lower end of the range.

The Direct-Observation Reliability Study

The study was carried out at JRH, a 120-bed rehabilitation hospital with a 40-bed stroke unit that admits approximately 200 stroke patients per year. Written informed consent was obtained prior to a patient's participation. A convenience sample of 20 cooperative persons with stroke was selected. These individuals were selected to represent a wide range of motor dysfunction, age, and time since stroke. We excluded patients with any major comorbid conditions that interfered with motor function or its assessment, such as a neurological condition in addition to the stroke, a severe comprehension disorder, marked bilateral motor or sensory impairment, amputation of a limb, or severe rheumatoid arthritis.

Sixteen subjects with ischemic stroke and 4 subjects with hemorrhagic stroke participated. Fourteen subjects had left-hemisphere cerebrovascular accidents (CVAs), probably reflecting a greater propensity to admit patients needing a multitude of rehabilitation services. The subjects (11 men, 9 women) ranged in age from 47 to 86 years (X=66.7, SD=10.7). They were evaluated on the STREAM measure from 47 to 238 days (X=104.5, SD=42.7) following the occurrence of their CVAs. The characteristics of these subjects are presented in Table 1.


View this table:
[in this window]
[in a new window]
Table 1. Characteristics of Subjects

 
Six physical therapists provided the scores for the 20 subjects in the sample of convenience. Prior to the reliability study, each therapist independently reviewed the STREAM test manual. The 6 therapists then participated in a 2-hour training session (led by the primary author [KD]) in which the STREAM test manual was discussed, a videotaped STREAM assessment was scored, and the scores were discussed. Currently, in clinical settings, therapists are orienting themselves to the STREAM measure using only the test manual or sometimes doing an in-service as a group. The therapists then independently practiced using the STREAM measure to score 2 patients in the clinical milieu. The evaluations for the reliability study were carried out simultaneously and independently by pairs of raters, with one therapist performing the assessment and the other therapist observing close at hand.

The 6 raters were 4 therapists working at JRH, 1 therapist from an acute care setting, and the primary author. They had 1 to 9 years of experience as physical therapists (X=5, SD=2.1) and had worked with patients with stroke for 1.5 to 3.5 years (X=2.5, SD=0.6). Details of the raters' clinical backgrounds are given in Table 2.


View this table:
[in this window]
[in a new window]
Table 2. Characteristics of Raters

 
The Videotaped Assessments Reliability Study

Four patients with a range of motor deficits secondary to stroke who had participated in the direct-observation reliability study and who agreed to be videotaped participated in this phase of the study. These patients were reassessed with the STREAM measure, and the assessments were videotaped.

The 4 subjects (3 men, 1 woman), identified in Table 1, ranged in age from 50 to 80 years (X=63.0, SD=10.8). Two of the subjects had left-sided CVAs, and the other 2 subjects had right-sided CVAs. One subject had aphasia, 2 subjects had perceptual and memory impairments, and 1 subject had mild shoulder pain. They were evaluated an average of 145 days (SD=60.4) after having had a stroke.

Twenty raters were recruited from Montreal-area health care facilities by sending notices to hospital physical therapy departments explaining the study. The raters were selected to cover a wide range of clinical backgrounds, with a minimum of 6 months of experience working with patients with stroke.

The 20 physical therapists who participated in rating the videotapes had diverse clinical backgrounds, with 6 months to 11 years of experience working with patients with stroke (X=4.5, SD=3.2) and 1 to 33 years of experience working as physical therapists (X=9.0, SD=7.9). Eight of the therapists worked in acute care settings, 9 therapists worked in inpatient and outpatient rehabilitation settings, and 3 therapists worked in longterm care (LTC) settings. Table 2 provides an overview of the raters' clinical backgrounds.

Prior to the first viewing session, the raters independently reviewed the STREAM test manual and practiced administering the STREAM measure on at least 2 patients. At the beginning of the first viewing session, the therapists practiced rating a sample videotaped performance (not the one used in the actual reliability study) and the scoring of the STREAM measure was briefly discussed as a group (led by the primary author). The 20 raters, divided into 2 groups of 10 raters for convenience of viewing, simultaneously and independently evaluated the 4 videotaped assessments. During the rating sessions, no discussion of the scoring of any of the items was permitted. Items were replayed up to 3 times upon request, as some of the smaller movements were more difficult to see on videotape and because, when the testing is done in the clinical setting, the patient is permitted up to 3 attempts to perform a test item. The raters viewed each of the 4 videotaped performances on one occasion, and on a second occasion approximately 1 month later. The videotapes were presented in a random order at each session.

Internal Consistency

For evaluating the internal consistency of scores obtained with the STREAM measure, we used the scores for the 20 patients in the direct-observation reliability study. None of these patients had a score below 30 (out of 100) on the STREAM measure. The STREAM measure, however, is intended for use in assessing the full range of motor function possible following stroke. Therefore, scores were collected for an additional 6 patients with low levels of motor function who met our inclusion criteria. Four additional physical therapists, who had participated in the videotaped assessments reliability study and who were familiar with administering the STREAM measure, provided us with the STREAM scores for these 6 patients.

The characteristics of the 20 subjects in the sample of convenience were presented previously. The 6 additional subjects with low levels of motor function (mean STREAM score=19/100, SD=9.7) were from 4 different facilities (ie, 1 LTC facility and 3 acute care hospitals). These subjects (2 men, 4 women) had an average age of 62 years (SD=14.5) and were evaluated an average of 26 days (SD=12) following stroke. Three of the subjects had ischemic strokes, and the remaining 3 subjects had hemorrhagic strokes. Two subjects had right-sided CVAs, and 4 subjects had left-sided CVAs. Three subjects were aphasic, and 4 subjects had prominent perceptual and cognitive impairments. The characteristics of these subjects are presented in Table 1.

Data Analysis

Scores for the individual items were summed to produce subscale and total scores on the STREAM measure for each subject. The subscale and total scores from the direct-observation reliability study were transformed to scores out of 100 (over the 20 subjects, a total of 15 items could not be scored because of restricted range of motion or pain). As there were no missing data in the videotaped assessments reliability study, however, raw scores were used (ie, a maximum total STREAM score of 70). SAS statistical software18,{ddagger} was used to compute Pearson correlations, cell frequencies, kappa statistics, signed-rank statistics, and Cronbach alphas. The GENOVA version 2.1 program19 was used to obtain generalizability correlation coefficients (GCCs) for subscale and total scores. The analyses of rater agreement were parallel for the 2 reliability studies, except that only interrater agreement was evaluated in the direct-observation reliability study, whereas estimates of intrarater and interrater agreement were derived from the videotaped assessments reliability study.

Interrater and intrarater reliability.
The agreement between raters for scoring items of the STREAM measure in the direct-observation reliability study was described using the index of crude agreement (the total percentage of subjects in which paired scores agree precisely), expected agreement, and Cohen's kappa statistic.20 Quadratically weighted kappa statistics2022 were used; this statistic produces values equivalent to intraclass correlation coefficients (ICCs) for the same data23,24 and reflects chance-corrected agreement where the disagreements between categories are viewed as varying exponentially or as being compounded by the distance between one another. Kappa values range from zero to unity; the closer to 1 kappa is, the better the agreement between scores. As the kappa statistic is prevalence dependent, it is influenced by the distribution of scores, the variability among subjects, and the number of rating categories; meaningful values of kappa can only be derived when there is sufficient variability in scores.25,26

Kappa statistics could not be derived for the individual item scores of the videotaped assessments reliability study due to insufficient variability in scores with only 4 subjects. The distributions of agreement on scoring, however, were tabulated.

For describing the agreement on subscale and composite scores, GCCs were calculated.27 For the direct-observation reliability study, these statistics reflected agreement between raters. For the videotaped assessments reliability study, GCCs described agreement both within and between raters. These statistics, based on the generalizability theory, are analogous to traditional reliability coefficients, except that GCCs reflect not only the magnitude of the error, as would a traditional reliability coefficient, but also attribute the error to a specific source.19,24,28 Generalizability correlation coefficients, like ICCs, range from 0 to 1, and are based on an analysis-of-variance model. The closer the GCC is to unity, the greater the generalizability or reliability. Reliability coefficients of .95 or better are recommended as the minimal requirement for a clinical outcome measure used in making judgments about individuals.2931 That is, only when at least 95% of the total variance is due to true variance is the risk of falsely classifying an individual acceptably small. When not affecting an individual's care directly, we can accept a somewhat lower degree of certainty, and coefficients of greater than .80 are generally considered to be acceptable when tests are used to make decisions about a group or for research purposes.2931

In both the direct-observation reliability study and the videotaped assessments reliability study, subjects and raters contributed to the variability of the error terms. In the videotaped assessments reliability study, an additional source of variance was the timing of the viewing (first or second). These sources of variance can be considered as (1) "fixed," where the resultant GCC indicates the extent to which a person can generalize across the particular subjects, raters, or occasions involved in the study, or (2) "random," where the related GCC indicates the extent to which a person can generalize results to any rater, subject, or occasion. When using videotapes to assess intrarater agreement, however, the variability associated with changes in a patient's performance on separate occasions is eliminated. Thus, estimates of reliability obtained using videotapes may differ from those that would be obtained in clinical practice.

In addition to GCCs, Wilcoxon matched-pairs signed-rank statistics,32 the nonparametric equivalent of the paired-sample t test, were computed to identify trends in the scoring. This statistic examines the differences between pairs of scores, which under the null hypothesis are assumed to have a median difference of zero.33 If the signed-rank statistic is significant, this finding indicates poor agreement due to a tendency to score either consistently higher or lower on the different rating occasions (intrarater) or by either the assessing or observing raters (interrater).

Internal consistency.
Three statistics were calculated: (1) Pearson correlation coefficients for each possible pair of items included, (2) the correlations between the scores for individual items and the subscale and total scores, calculated by omitting that item, and (3) Cronbach alphas34 for each subscale and for the STREAM measure as a whole. Because Cronbach alphas are influenced by the total number of items included in an instrument and increase in value if related items are added, alpha values must be interpreted accordingly.


    Results
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 
The Direct-Observation Reliability Study

Interrater agreement for individual items of the STREAM measure.
Over all items, there were a total of 585 paired ratings (ie, 30 items x 20 subjects, with 15 items scored as "X"). Perfect agreement occurred for 89.4% of these ratings, disagreement by 1 category occurred for 9.6% of the ratings, and disagreement by 2 categories occurred for only 1.0% of the ratings.

Figure 1 is a stem-and-leaf plot summarizing the distribution of kappa statistics for the interrater agreement on scores for each of the 30 items. The kappa values clustered around .8 and .9, indicating that there was excellent agreement on scoring of all of the items, with the exception of the item "rising from sitting to standing," which demonstrated a kappa value of .65.


Figure 1
View larger version (10K):
[in this window]
[in a new window]
Figure 1. Stem-and-leaf plot of kappa statistics showing interrater agreement on scoring the 30 items of the Stroke Rehabilitation Assessment of Movement (STREAM) measure in the direct-observation reliability study. Numbers in box (stem) represent first decimal place of kappa; numbers to right of box (leaves) represent second decimal place of kappa. Kappas are quadratically weighted.

 
Interrater agreement for STREAM subscale and total scores.
The STREAM subscale and total scores given by the 2 raters for each of the 20 subjects are graphed in Figure 2. The close proximity of the 2 lines indicates excellent interrater agreement that was consistent across the entire range of scores. The GCCs for interrater agreement on the subscale and total scores for the 20 subjects are presented in Table 3 and Figure 2. Of the 3 subscales, the scores on the upper-extremity subscale were the most reliable, followed by the scores of the lower-extremity and basic mobility subscales.


Figure 2
View larger version (31K):
[in this window]
[in a new window]
Figure 2. Direct-observation reliability study: interrater agreement. GCC=generalizability correlation coefficient.

 

View this table:
[in this window]
[in a new window]
Table 3. Results of the Stroke Rehabilitation Assessment of Movement (STREAM) Measure Reliability Studies

 
There was a tendency for the rater who was observing to score slightly higher than the rater who was doing the hands-on assessment. The observers and the assessors gave the same total STREAM scores for 7 of the 20 subjects, but the observers gave higher total STREAM scores than did the assessors for 10 subjects. To determine whether the difference between the scores given by the 2 raters was significant, signed ranks were computed for subscale and total scores. These test statistics, along with their related probabilities, are shown in Table 3. None of the signed ranks were significant (at P<.05). The signed-rank statistic, however, was marginal for the total scores.

The Videotaped Assessments Reliability Study

Intrarater agreement for individual items of the STREAM measure.
Over the 30 items on the 4 videotapes, there were a total of 2,400 paired ratings (20 raters x 4 videotapes x 30 items). Perfect agreement occurred in 85.7% of the ratings, disagreement by 1 category occurred for 12.1% of the ratings, and for only 2.2% of the ratings were there disagreements of 2 categories. There was generally slightly less perfect agreement on scoring of the videotapes of the subjects with left-sided CVAs and perceptual problems (videotapes C and D).

Intrarater agreement for STREAM subscale and total scores.
Figure 3 shows the pattern of agreement on the total STREAM scores given by the 20 raters for each of the 4 videotapes on the 2 occasions. The close proximity of the 2 lines suggests excellent intrarater agreement. The GCCs for intrarater agreement on the subscale and total scores over the 4 videotapes are shown in Table 3. The GCCs were virtually identical for the models with raters fixed and random. Unlike the results of the direct-observation reliability study, where the upper-extremity subscale was the most reliably scored, the GCCs in this study were slightly higher for the lower-extremity and mobility subscales, followed by the upper-extremity subscale. All raters demonstrated excellent intrarater agreement on scoring the 4 videotapes, with GCCs for individual raters ranging from .982 to .999 (model with subjects and occasions fixed).


Figure 3
View larger version (21K):
[in this window]
[in a new window]
Figure 3. Videotaped assessments reliability study: intrarater agreement. GCC=generalizability correlation coefficient.

 
Although the paired ratings were generally within a few points of each other, the STREAM subscale and total scores given were consistently higher (an average of 2 points higher for total scores over the 4 videotapes) for the second rating session. The signed ranks, presented in Table 3, were significant (P <.05) for all subscales on videotapes C and D, indicating that the trend to score higher on the second viewing session was significant for these videotapes.

Interrater agreement for STREAM scores on the 2 occasions.
The relative flatness (slope near 0) of the lines that show the scores on the 2 occasions in Figure 3 indicates that the agreement between raters on the STREAM total scores was excellent on each occasion. On both occasions, the GCCs reflecting the agreement among the 20 raters on STREAM total scores given on the 4 videotaped assessments were .999 (model with subjects fixed and raters random).

Internal Consistency

The individual item-to-total correlations for the STREAM measure were calculated for our sample of 26 subjects and ranged from .579 to .926. Item-to-subscale score correlations ranged from .585 to .967. The alpha coefficients reflecting the effect that omitting a particular item would have on the overall alpha ranged from .982 to .984. Alpha coefficients were .965 for the mobility subscale and .979 for each of the limb subscales. The overall alpha coefficient for the STREAM measure was .984.


    Discussion
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 
The extent to which the results of any reliability study can be generalized to clinical practice depends on how closely the conditions in the study approximate those of assessment in the clinical setting, that is, on how similar are the subjects, the raters, and the setting and manner in which the evaluations are done.

The subjects participating in the STREAM reliability studies reflected the distribution and range of comorbid medical problems typically encountered in the stroke rehabilitation setting, and they demonstrated a wide range of motor ability. The results of these studies, therefore, should be generalizable to similar inpatient rehabilitation populations.

The clinical backgrounds of the raters involved were diverse. The raters had graduated from 6 universities, were working in 12 different facilities, and had a wide range of general and stroke-specific clinical experience. In addition, despite very little training in the use of the STREAM measure, excellent reliability was achieved. The results of our studies, therefore, should be generalizable across the spectrum of rater training and experience. That is, similar results should be achievable among physical therapists who have carefully read the STREAM test manual and have used the STREAM measure a few times in the clinical setting to familiarize themselves with the scoring form. It must be acknowledged, however, that the brief training that the raters received could have positively influenced the reliability estimates in this study and thus potentially limit the generalizability of the study results.

The estimates of interrater and intrarater reliability obtained for the STREAM measure under the conditions imposed in this study exceed the .95 level that some authors believe is required for clinical decision making.2931 However, the testing procedures used in the direct-observation reliability study, where both ratings are made in a quiet environment and in the same testing session so that variability in patient performance is eliminated, represent ideal conditions for achieving reliable scores. Similarly, videotaped assessments may be performed in a more standardized fashion and under more controlled conditions than would be found in a busy clinical setting. Thus, reliability estimates obtained under the somewhat artificial conditions of the study may represent "optimal" reproducibility. The generalizability of the results of this study to the realities of the clinical setting, where therapists assess patients who may be less than cooperative, on separate occasions, and with frequent interruptions, still needs to be determined through further study.

Only one of the items retained on the completed STREAM measure demonstrated less than excellent reliability. Kappa was only .65 for the "sit to stand" item. This item was retained, however, because the consensus panels had deemed it to be crucial as an important milestone of motor functioning and because it performed well in terms of internal consistency, correlating at .83 with the subscale score and .77 with the total score. The disagreements between raters' scores on this item (in both reliability studies) were due to difficulty differentiating between normal movement patterns (score 2 or 3) and abnormal movement patterns (score 1c). In an attempt to improve the reliability of scores for this item, a note has been added to the scoring form (ie, "Note: pushing up with hand[s] to stand=aid [score 2]; asymmetry such as trunk lean, Trendelenburg position, hip retraction, or excessive flexion or extension of the affected knee=marked deviation [score 1a or 1c]").

As shown by the plots of paired ratings (Figs. 2 and 3), the reliability of scores obtained for the STREAM measure was excellent across the full range of scores. This is an important quality for instruments intended to be used for evaluating patients with a wide range of motor capabilities. It is not clear from the literature whether the other available stroke motor assessments demonstrated similar reliability across the range of scores.

For the direct-observation reliability study, the finding of slightly greater reliability for scores on the upper-extremity subscale versus scores on the lower-extremity and mobility subscales may be due to a greater heterogeneity in patients' upper-extremity scores, as upper-extremity recovery tends to be slower and less complete than that for the lower extremity. Another possible contributor to this slightly higher reliability may be that several of the patients had flaccid upper extremities and clearly were not able to perform the test movement (score of 0), thereby reducing the possibility of rater disagreement on scoring.

In contrast, in the videotaped assessments reliability study, the upper-extremity subscale was slightly less reliably scored. Because it is more difficult to observe small movements on videotape, such as movements of the hand, this finding is not surprising. Also not unexpected were the slightly lower levels of overall agreement we observed for videotapes C and D, in which our 2 subjects were moderately influenced by increased reflex activity and had perceptual problems. This finding suggests that heightened reflex activity and perceptual problems may make the scoring of movement slightly more difficult.

There was a tendency for raters to score the videotaped subjects higher during the second rating occasion than during the first rating occasion. It is possible that this trend was an artifact of the therapists' "learning curve."

The slightly higher level of intrarater agreement for the videotaped assessments reliability study compared with the level of interrater agreement found in the direct-observation reliability study is as would be expected, as typically agreement within raters is better than that between raters. The slightly higher overall interrater agreement achieved in the videotaped assessments reliability study than in the direct-observation reliability study is, in light of the more controlled testing conditions, also as we would expect.

Several factors probably contributed to the very high estimates of reliability obtained for STREAM scores. The potential effects of the somewhat controlled conditions have already been elucidated. In our opinion, however, several characteristics of the STREAM measure, most notably the simple scoring scheme and standardized testing instructions, are likely to enhance the reliability of the scores, regardless of the testing conditions. That amplitude, gross quality, and independence in mobility (ie, quantity, quality, and independence of movement) have been incorporated into the scoring of the STREAM measure (where no other stroke motor assessment includes all of these dimensions of interest to therapists), while still maintaining simplicity and objectivity, is an additional plus.

The high degree of internal consistency found for the STREAM measure indicates that the items are measuring one concept, presumably the recovery of motor function. The alpha coefficients for the STREAM measure and its subscales surpass the recommended .9035 for an instrument to be clinically useful for measuring a specific concept. The subjects included in the internal consistency analysis came from an inpatient rehabilitation setting as well as from acute care and LTC settings, and their scores on the STREAM measure were distributed across the entire range of possible scores. The results of this analysis, therefore, should be representative of the performance of the STREAM measure when it is used with the population for which it is intended.

Although we have presented internal consistency alongside rater agreement, and internal consistency is frequently considered as a form of reliability in the literature, we contend that it may be more appropriate to consider internal consistency as being related to validity. For example, in the process of item reduction and justification for the content of the STREAM measure, we used the correlations between an individual item's scores and (1) each of the other items' scores (inter-item correlations), (2) subscale scores (item-to-subscale correlations), and (3) total scores (item-to-total correlations).17 The full details of the internal consistency analysis are presented in this article for completeness, but the reader should recognize that internal consistency is an issue separate from rater agreement.

Only one other stroke motor assessment9 has been evaluated in terms of internal consistency, although this psychometric property can conveniently be evaluated using the same data that are used to obtain estimates of rater agreement, provided that the subject's scores span the range of the instrument. In addition, because the information obtained through internal consistency analysis can be used to support the appropriateness of the content, it would seem all the more important to evaluate internal consistency.

In all of the characteristics that are important for an outcome measure for the recovery of movement following stroke, the STREAM measure rivals other related measures. It provides a considerable amount of internally consistent and reliable information on the movement of individuals following stroke. Its greatest advantage over other measures may be its excellent clinical utility. Although the STREAM measure consists of 30 items, the simple scoring process and the way in which the instrument is organized (ie, ordinal scaling with consistent descriptions applied across all items, flow of items from supine to standing and from low to high level in terms of motor ability, standardized verbal instructions, and the fact that no special equipment is required) combine to facilitate rapid assessment. These qualities of the STREAM measure will certainly be much appreciated by clinicians.

Implications for Future Research on the STREAM Measure

The involvement of patients with stroke with varied clinical profiles in the reliability studies has indicated that the reliability of scores obtained with the instrument across a relatively diverse population is excellent. Further tests of reliability, however, should be carried out in different clinical facilities and settings (eg, acute care, LTC) and under less controlled conditions than was the case in this study in order to show the generalizability of the STREAM scores' reliability across institutions, patient populations, and testing conditions. The results of other measurement studies evaluating criterion validity and construct validity and of longitudinal studies assessing the responsiveness of the STREAM measure are pending. A training videotape is planned, and a test manual is available to help standardize the testing procedures. Ultimately, the STREAM measure will need to be used in clinical trials and evaluated in terms of its efficiency relative to other instruments for discerning the effect of treatments on the recovery of movement.


    Conclusions
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 
The reliability of scores obtained with the STREAM measure as determined under the conditions of this study is excellent, both within and between raters, with GCCs of .99 for total scores and from .96 to .99 for subscale scores. The internal consistency of the STREAM scores is also excellent, with Cronbach alphas of greater than .98 on the subscales. It is anticipated that the simplicity and overall clinical utility of the STREAM measure will facilitate the incorporation of this instrument into the clinical setting for the routine objective measurement of motor function. Further testing of the measurement properties of the STREAM measure, namely validity and responsiveness, are under way.


    Appendix
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 


Figure 1
Figure 1
Figure 1
Figure 1
View larger version (259K):
[in this window]
[in a new window]
 


    Footnotes
 
Ethical approval for this study was obtained from the ethics committees of the Jewish Rehabilitation Hospital and the School of Physical and Occupational Therapy, McGill University.

This research was presented, in part, in poster format at the Joint Congress of the Canadian Physiotherapy Association and the American Physical Therapy Association, June 4–8, 1994, Toronto, Ontario, Canada.

This project was funded by the Réseau de Recherche en Readaptation de Montréal et de l'ouest du Québec (RRRMOQ). The first author was supported, in part, by a Royal Canadian Legion/Physiotherapy Foundation of Canada fellowship in gerontology.

* Working Group on Outcome Measures in Physiotherapy. Health and Welfare Canada Contract, 1992. Address requests for information to: Dr Nancy Mayo, The STREAM Research Group, School of Physical and Occupational Therapy, McGill University, Davis House, 3654 Drummond St, Montreal, Quebec, Canada H3G 1Y5. Back

{dagger} Available on request from Dr Mayo. Back

{ddagger} SAS Institute Inc, PO Box 8000, Cary, NC 27511. Back


    References
 Top
 Abstract
 Introduction
 Background of the STREAM...
 Method
 Results
 Discussion
 Conclusions
 Appendix
 References
 

  1. Ashburn A. A physical assessment for stroke patients. Physiotherapy.1982; 68:109–113.[Medline]
  2. Bobath B. Adult Hemiplegia: Evaluation and Treatment. London, England: William Heinemann Medical Books Ltd;1978 .
  3. Brunnström S. Movement Therapy in Hemiplegia. New York, NY: Harper & Row;1970 .
  4. Carr JH, Shepherd RB, Nordholm L, Lynne D. Investigation of a new motor assessment scale for stroke patients. Phys Ther.1985; 65:175–180.[Abstract/Free Full Text]
  5. Fugl-Meyer A, Jaasko L, Leyman I, et al. The post-stroke hemiplegic patient, I: a method for evaluation of physical performance. Scand J Rehabil Med.1975; 7:13–31.[Medline]
  6. Gowland C, Torresin W, Stratford PW. Chedoke-McMaster Stroke Assessment: a comprehensive clinical and research measure. In: Proceedings of the 11th International Congress of the World Confederation for Physical Therapy, Barbican Centre, London, England, 1991. London, England: Chartered Society of Physiotherapy;1991; 2:851–853.
  7. Guarna F, Corriveau H, Chamberland J, Arsenault B. An evaluation of the hemiplegic subject based on the Bobath approach. Scand J Rehabil Med.1988; 20:1–16.[ISI][Medline]
  8. Lindmark B, Hamrin E. Evaluation of functional capacity after stroke as a basis for active intervention. Scand J Rehabil Med.1988; 20:103–115.[ISI][Medline]
  9. LaVigne JM. Hemiplegia sensorimotor assessment form. Phys Ther.1974; 54:128–134.[ISI][Medline]
  10. Lincoln A, Leadbitter I. Assessment of motor function in stroke patients. Physiotherapy.1979; 65:48–51.[Medline]
  11. Badke MB, Duncan PW. Patterns of rapid motor responses during postural adjustments when standing in healthy subjects and hemiplegic patients. Phys Ther.1983; 63:13–20.[Abstract/Free Full Text]
  12. Bernspang B, Asplund K, Eriksson S, Fugl-Meyer A. Motor and perceptual impairments in acute stroke patients: effects on self-care activity. Stroke.1987; 18:1081–1086.[Abstract/Free Full Text]
  13. Dettmann M, Linder MT, Sepic SB. Relationships among walking performance, postural stability, and functional assessments of the hemiplegic patient. Am J Phys Med.1987; 66:77–90.[ISI][Medline]
  14. Gowland C. Recovery of motor function following stroke: profile and predictors. Physiotherapy Canada.1982; 34:77–84.
  15. Henley S, Pettit S, Todd-Pokropek A, Tupper A. Who goes home? Predictive factors in stroke recovery. J Neurol Neurosurg Psychiatry.1985; 48:1–6.[Abstract/Free Full Text]
  16. Loewen S, Anderson B. Predictors of stroke outcome using objective measurement scales. Stroke.1990; 21:78–81.[Abstract/Free Full Text]
  17. Daley K, Mayo N, Danys I, et al. The Stroke Rehabilitation Assessment of Movement (STREAM): refining and validating the content. Physiotherapy Canada.1997; 49:269–278.
  18. SAS Procedures Guide for Personal Computers, Version 6 Edition. Cary, NC: SAS Institute Inc;1985 .
  19. Crick T, Brennan R. GENOVA: A Generalized Analysis of Variance System, Version 2. Iowa City, Iowa: American College Testing Program Inc;1983 .
  20. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement.1960; 20:37–46.[Medline]
  21. Cohen J. Weighted kappa: nominal scale agreement with provision for scale disagreement or partial credit. Psychol Bull.1968; 70:213–220.[ISI]
  22. Kramer M, Feinstein A. Clinical biostatistics, LIV: the biostatistics of concordance. Clin Pharmacol Ther.1981; 29:111–123.[ISI][Medline]
  23. Fleiss J, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as a measure of reliability. Educational and Psychological Measurement.1973; 33:613–619.[ISI]
  24. Streiner D, Norman G. Health Measurement Scales: A Practical Guide to Their Development and Use. Oxford, England: Oxford University Press;1991 .
  25. Feinstein A, Cicchetti D. High agreement but low kappa, I: the problems of two paradoxes. J Clin Epidemiol.1990; 43:543–549.[ISI][Medline]
  26. Soeken K, Prescott P. Issues in the use of kappa to estimate reliability. Med Care.1986; 24:733–741.[ISI][Medline]
  27. Cronbach L, Glester G, Nanda H. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York, NY: John Wiley & Sons Inc;1972 .
  28. DeVellis R. Scale Development: Theory and Applications. London, England: Sage Publications Ltd;1991 .
  29. Helmstadter G. Principles of Psychological Measurement. New York, NY: Appleton-Century-Crofts;1964 .
  30. Nunnally J. Psychometric Theory. New York, NY: McGraw-Hill Inc;1978 .
  31. Weiner E, Stewart B. Assessing Individuals. Boston, Mass: Little, Brown & Co Inc;1984 .
  32. Wilcoxon F. Individual comparisons by ranking methods. Biometrics Bulletin.1945; 1:80–89.[ISI]
  33. Colton T. Statistics in Medicine. Boston, Mass: Little, Brown & Co Inc;1974 .
  34. Cronbach L. Coefficient alpha and the internal structure of tests. Psychometrika.1951; 16:297–334.[ISI]
  35. Feinstein A. Clinametrics. New Haven, Conn: Yale University Press;1987 .



This article has been cited by other articles:


Home page
Neurorehabil Neural RepairHome page
I-P. Hsueh, M.-J. Hsu, C.-F. Sheu, S. Lee, C.-L. Hsieh, and J.-H. Lin
Psychometric Comparisons of 2 Versions of the Fugl-Meyer Motor Scale and 2 Versions of the Stroke Rehabilitation Assessment of Movement
Neurorehabil Neural Repair, November 1, 2008; 22(6): 737 - 744.
[Abstract] [PDF]


Home page
Neurorehabil Neural RepairHome page
Y.-W. Hsieh, C.-H. Wang, C.-F. Sheu, I-P. Hsueh, and C.-L. Hsieh
Estimating the Minimal Clinically Important Difference of the Stroke Rehabilitation Assessment of Movement Measure
Neurorehabil Neural Repair, November 1, 2008; 22(6): 723 - 727.
[Abstract] [PDF]


Home page
Neurorehabil Neural RepairHome page
H.-M. Chen, C.-L. Hsieh, Sing Kai Lo, L.-J. Liaw, S.-M. Chen, and J.-H. Lin
The Test-Retest Reliability of 2 Mobility Performance Tests in Patients With Chronic Stroke
Neurorehabil Neural Repair, July 1, 2007; 21(4): 347 - 352.
[Abstract] [PDF]


Home page
ptjournalHome page
I-P. Hsueh, W.-C. Wang, C.-H. Wang, C.-F. Sheu, S.-K. Lo, J.-H. Lin, and C.-L. Hsieh
A Simplified Stroke Rehabilitation Assessment of Movement Instrument
Physical Therapy, July 1, 2006; 86(7): 936 - 943.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
Q. P. Tang, Q. D. Yang, Y. H. Wu, G. Q. Wang, Z. L. Huang, Z. J. Liu, X. S. Huang, L. Zhou, P. M. Yang, and Z. Y. Fan
Effects of Problem-Oriented Willed-Movement Therapy on Motor Abilities for People With Poststroke Cognitive Deficits
Physical Therapy, October 1, 2005; 85(10): 1020 - 1033.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
J. E Sullivan and L. D Hedman
A Home Program of Sensory and Neuromuscular Electrical Stimulation With Upper-Limb Task Practice in a Patient 5 Years After a Stroke
Physical Therapy, November 1, 2004; 84(11): 1045 - 1054.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
S. Ahmed, N. E Mayo, J. Higgins, N. M Salbach, L. Finch, and S. L Wood-Dauphinee
The Stroke Rehabilitation Assessment of Movement (STREAM): A Comparison With Other Measures Used to Evaluate Effects of Stroke and Rehabilitation
Physical Therapy, July 1, 2003; 83(7): 617 - 630.
[Abstract] [Full Text] [PDF]


Home page
StrokeHome page
I-P. Hsueh, C.-H. Wang, C.-F. Sheu, and C.-L. Hsieh
Comparison of Psychometric Properties of Three Mobility Measures for Patients With Stroke
Stroke, July 1, 2003; 34(7): 1741 - 1745.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
S. J Page, P. Levine, S. A. Sisto, and M. V Johnston
Mental Practice Combined With Physical Practice for Upper-Limb Motor Deficit in Subacute Stroke
Physical Therapy, August 1, 2001; 81(8): 1455 - 1462.
[Abstract] [Full Text] [PDF]


Home page
StrokeHome page
N. E. Mayo, S. Wood-Dauphinee, R. Cote, D. Gayton, J. Carlton, J. Buttery, and R. Tamblyn
There's No Place Like Home : An Evaluation of Early Supported Discharge for Stroke
Stroke, May 1, 2000; 31(5): 1016 - 1023.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Daley, K.
Right arrow Articles by Wood-Dauphinée, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Daley, K.
Right arrow Articles by Wood-Dauphinée, S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1999 by the American Physical Therapy Association.