PTJ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


PHYS THER
Vol. 88, No. 7, July 2008, pp. 888-890
DOI: 10.2522/ptj.2008.88.7.888

This Article
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Stratford, P. W
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stratford, P. W
Related Collections
Right arrow Balance
Right arrow Parkinson Disease and Parkinsonian Disorders
Right arrow Tests and Measurements
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Letters and Responses

On "Test-retest reliability and minimal detectable change on balance..." Steffen T, Seney M. Phys Ther. 2008;88:733–746.


Translating reliability coefficients into clinically meaningful representations of measurement error is a necessary and important step when the goal is to link clinical research to clinical practice. The study by Steffen and Seney1 investigates the reliability of several balance and ambulation tests and converts the obtained coefficients into minimal detectable change (MDC) estimates. The authors apply Shrout and Fleiss2 type 3,k intraclass correlation coefficients (ICC) to quantify relative reliability and, from these estimates, they calculate the standard error of measurement (SEM) to quantify measurement error in the same units as the original measurement. For some of the balance and ambulation tests, 2 trials were performed on each of 2 occasions (eg, Timed "Up & Go" Test [TUG]); for other tests (eg, Six-Minute Walk Test [6MWT]), a single measurement was performed on each of 2 occasions. In the former case, the authors reported a type 3,2 ICC; in the latter case, they presented a type 3,1 ICC.

The authors’ rationale for applying the type 3,k ICC was "The ICC(3,k) was used instead of the Pearson correlation coefficient (r) for test-retest reliability because it assesses rating reliability by comparing the variability of different ratings of the same subject with the total variation across all ratings and all subjects."1(pp740–741) In fact, the type 3,1 ICC provides an estimate of reliability similar to the Pearson r because neither coefficient accounts for a systematic difference in scores between the replicate measures (eg, either trials or occasions in Steffen and Seney's study). Presumably, in a test-retest reliability study, one is interested in both systematic and random errors, and, if this is true, the type 2,k ICC is the better choice because it includes both sources of variance in the reliability coefficient calculation. When the systematic error is zero, the type 2,k and 3,k ICCs provide identical estimates of reliability. However, when systematic error is present, as in the case of Steffen and Seney's 6MWT data, the type 2,k ICC will be less than the type 3,k ICC.

My second reflection addresses the use of the Shrout and Fleiss classification system in situations where 2 or more facets exist, such as for the TUG data. Here, the facets are trials and occasions. A dilemma occurs when attempting to interpret the meaning of the type 3,2 ICC reported by Steffen and Seney. It is not clear if the second digit (2) refers to 2 trials, 2 occasions, or 2 trials performed on each of 2 occasions (ie, a total of 4 measurements). I propose that a generalizability3 approach to the analysis has the potential to provide a clearer picture of the sources of variance, their magnitude, and the relative merits of averaging over either trials or occasions, or both.

To illustrate the points raised above, I have generated synthetic data for the TUG. Paralleling the design of Steffen and Seney, the synthetic data represent 2 TUG trials performed on each of 2 occasions for 10 persons. The data presented in Table 1 were contrived to illustrate a systematic difference between occasions, but no systematic difference between trials.


View this table:
[in this window]
[in a new window]

 
Table 1. Synthetic Timed "Up & Go" Data

 
Table 2 reports the mean scores for trials and occasions. Of interest is that the trial means averaged over occasions are almost identical; however, the occasion means differ. Stated another way, a systematic difference exists between occasions, but not between trials averaged over occasions.


View this table:
[in this window]
[in a new window]

 
Table 2. Trial and Occasion Means

 
Table 3 displays Shrout and Fleiss type 2,1 and type 3,1 ICCs obtained by performing randomized block analysis of variance (ANOVA). Negative variance estimates were set to zero for all analyses. Pearson r values also are reported in this table. That the inter-trial type 2,1 and 3,1 ICCs are identical to 2 decimal places reflects the similarity of trial means shown in Table 2. By contrast, the interoccasion means shown in Table 2 differed, and this systematic difference is not reflected in the type 3,1 ICC or in the Pearson r. Accordingly, the type 3,1 ICC is greater than the type 2,1 ICC because the variance due to occasion is greater than zero.


View this table:
[in this window]
[in a new window]

 
Table 3. Type 2,1 and 3,1 Inter-trial and Inter-occasion Intraclass Correlation Coefficients (ICC)

 
The following section illustrates a generalizability analysis that includes both trials and occasions in a single analysis. I applied a 3-way random effects ANOVA. The rationale for applying a random effects model was that I wished to generalize beyond the persons, trials, and occasions composing the study sample. The ANOVA and variance components were calculated using MINITAB statistical software*, and the results appear in Table 4. Once again, negative variance estimates were set to zero.


View this table:
[in this window]
[in a new window]

 
Table 4. Analysis of Variance and Variance Components

 
Inspection of the variance components reveals the following important findings: (1) there is a large variance among persons, and this is desirable, (2) the variance between trials averaged over occasions is zero (this reflects the near identical means reported in Table 2), (3) there is a relatively large variance due to occasions (this reflects the difference in occasion means reported in Table 2), (4) the person by occasion (P x O) variance is substantially greater than the person by trial (P x T) variance (this suggests that averaging over occasion will have a greater effect than averaging over trials), and (5) the residual error is relatively small compared with the person variance.

The variance components reported in Table 4 can be applied to calculate generalizability coefficients that represent inter-trial and inter-occasion reliability. They also can be used to examine the distinct effect of averaging over trials, occasions, or both.

The theoretical inter-trial reliability (generalizability) for a single trial is obtained by substituting the variance components into Equation 1 and by setting nt and no to 1. The obtained value is .97, and this is analogous to the Shrout and Fleiss type 2,1 inter-trial ICCs of .96 reported in Table 3. The inter-trial reliability for an average of 2 trials can be obtained by setting nt to 2 and no to 1. This yields an inter-trial reliability of .98, which is analogous to a Shrout and Fleiss type 2,2 ICC.



Formula 1

(1)

When the goal is to draw inferences about the change status of a person, as is the case when MDC is applied, the inter-occasion reliability (generalizability) coefficient is of interest. It is calculated by applying Equation 2. The theoretical inter-occasion reliability for a single trial is obtained by substituting the variance components into Equation 2 and by setting nt and no to 1. This gives an inter-occasion reliability of .74, which is the average of the 2 inter-occasion reliability estimates reported in Table 3. The inter-occasion reliability for a single trial performed on each of 2 occasions is obtained by setting nt to 1 and no to 2. This yields an inter-occasion reliability of .85.



Formula 2

(2)

Finally, one can examine the inter-occasion reliability for the average of 2 trials on each of 2 occasions. This is accomplished by setting nt to 2 and no to 2 in Equation 2. A value of .86 is obtained, and, to my knowledge, there is no equivalent Shrout and Fleiss coding scheme to represent this combination.

Paul W Stratford

PW Stratford, PT, MSc, is Professor, School of Rehabilitation Science, McMaster University, Hamilton, Ontario, Canada.


   Footnotes
 
This letter was posted as a Rapid Response on June 3, 2008, at www.ptjournal.org.

* Minitab Inc, Quality Plaza, 1829 Pine Hall Rd, State College, PA 16801-3008. Back

References

  1. Steffen T, Seney M. Test-retest reliability and minimal detectable change on balance and ambulation tests, the 36-Item Short-Form Health Survey, and the Uni-fied Parkinson Disease Rating Scale in people with parkinsonism. Phys Ther. 2008;88:733–746.[Abstract/Free Full Text]
  2. Shrout PE, Fleiss JL. Intraclass correlation: uses in assessing rater reliability. Psychol Bull. 1979;86:420–428.[CrossRef][Web of Science][Medline]
  3. Brennan RL. Elements of Generalizability Theory. Iowa City, Iowa: ACT Publications; 1983.

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Stratford, P. W
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stratford, P. W
Related Collections
Right arrow Balance
Right arrow Parkinson Disease and Parkinsonian Disorders
Right arrow Tests and Measurements
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by the American Physical Therapy Association.