|
|
||||||||
III STEP Series |
SM Haley, PT, PhD, is Associate Director, Health and Disability Research Institute, Boston University, Boston, Mass, and Director of Research, Research Center for Children With Special Health Care Needs, Franciscan Hospital for Children, Boston, Mass
MA Fragala-Pinkham, PT, MS, is Research Associate, Health and Disability Research Institute, Boston University, and Clinical Researcher, Research Center for Children With Special Health Care Needs, Franciscan Hospital for Children
(smhaley{at}bu.edu) Address all correspondence to Dr Haley at Health and Disability Research Institute, Boston University, 53 Bay State Rd, Boston, MA 02215 (USA)
Submitted July 9, 2005;
Accepted December 21, 2005
| Abstract |
|---|
Key Words: Health status Measurement: applied Outcome assessment (health care) Pediatrics Physical disability Professional issues
| Introduction |
|---|
|
|
|---|
The physical therapist providing intervention (and others) may have a number of questions regarding how to interpret the functional test results described in the case. For example: What do the summary scores from the outcome measures mean? How do we interpret the change score? Has the child achieved "clinically significant change" up to this point in the hospitalization and physical therapy episode of care? Is the change meaningful? Is the change score beyond measurement error that would typically occur in the routine administration of this measure? How can these scores be used to help examine the patterns of mobility changes that have taken place? Because the meanings of scores on a standardized instrument are not intuitively apparent,1 there is a need to provide meaning to scores that result from tests and measures used in physical therapist practice.
Physical therapy and other health care fields are beginning to explore, in increasing depth, the proper interpretation of tests and measures and the clinical changes that score improvements represent. Measures to detect important effects related to physical therapy intervention must be valid (ie, measure what is intended), responsive (ie, able to detect an important change, even if that change is small), and interpretable (ie, the intended audience must understand the magnitude of effect).1,2 At the center of this issue of "interpretability" is the attempt to have a better understanding of a "clinically significant difference" (CSD).3,4 Understanding CSD can be a bewildering endeavor, particularly with the myriad of terms and anachronisms that are used across different fields and traditions. A number of terms to describe the phenomenon of CSD have been proposed, but different terms often have a similar meaning, such as "reliable change index" (RCI) and "minimal detectable change" (MDC), or "minimal clinically important difference" (MCID) and "minimal important difference" (MID).
Various audiences may have very different perspectives on CSD. For example, from a patients point of view, a clinically significant change could result from greater freedom to resume previous activities; for a physical therapist, however, CSD may provide an indication to change the course of intervention. For other audiences such as payers, CSD may have a broader definition relating to a reduction in costs and utilization of future health care dollars. Crosby et al5 and Wells et al6 provided comprehensive reviews of CSD and its associated terminology.
In this article, we will present a selected perspective on how physical therapists can interpret clinical changes both at the individual and group levels. Our presentation will adopt a deductive approach toward identifying the meaning of clinical change by using information from group studies and applying these findings to individual patients. Cella et al3 provided a detailed discussion of the merits of both deductive and inductive (starting with the individual and applying findings to group analyses) approaches toward defining meaningful changes. We also will highlight some remaining challenges that will need to be solved, particularly with the accelerating use of instruments designed with item response theory (IRT) methods, so that the meaning of CSD can be more readily understood by physical therapists, patients, and other interested parties.
We approach the topic of CSD by identifying 2 complementary but distinct methods. Distribution-based methods rely on expressing change scores in terms of an underlying sampling distribution, whether in between-person standard deviation units, within-person standard deviation units, or some variation of the standard error of measurement (SEM). These methods are based on statistical significance, sample variability, and measurement precision. In contrast, anchor-based approaches require an external, independent standard to "anchor" the meaning of clinical importance, one that is itself interpretable and at least moderately correlated with the test or measure. We will highlight an example of both distribution-based and anchor-based methods for interpreting the functional outcome data in the physical therapy case presented above. Eton et al,7 Wyrwich,8 and Schmitt and Di Fabio9 provided a more comprehensive review of both distribution- and anchor-based methods.
| Minimal Detectable Change |
|---|
|
|
|---|
|
2 is to account for the additional uncertainty introduced by using difference scores from measurements at 2 points in time. The MDC is considered the minimal amount of change that is not likely to be due to chance variation in measurement. For the case example, we will use a CI of 90%, because that level seems to be the most common standard used in the literature; however, an MDC at a 95% CI or other values could be selected, depending on the precision needed for the score estimate.
A vital choice in calculating the SEM is whether one uses internal consistency or test-retest reliability to calculate MDC. Although Wyrwich and colleagues8,12 argued for using internal consistency (Cronbach alpha), we favor the more conservative approach of using test-retest reliability. The size of the reliability coefficient that is used is a very critical element in the equation; therefore, instruments that cannot demonstrate good stability across repeated tests will have sizable MDCs.
It is interesting to note that the use of a form of the SEM for understanding the extent of estimated measurement error is not new in physical therapy. Hinderer et al13 proposed using the SEM (with a test-retest correlation estimate) to determine the extent to which the Peabody Developmental Motor Scales were stable in the context of determining clinical change in pediatric patients. We should not be fooled that recent updates of terms by authors, or minor changes in error calculations, are something new to the field of physical therapy tests and measures. Perhaps we may not have fully appreciated the importance of estimating distributional errors in tests and measures used in physical therapist practice; however, the approach toward estimating distributional measurement precision has been recommended for more than a decade.
In the case example, we use the Pediatric Evaluation of Disability Inventory (PEDI) as a broad functional measure in an inpatient rehabilitation setting. The PEDI14 is designed to measure functional status in children and youths between the ages of 6 months and 7.5 years in 3 content domains: self-care, mobility, and social function. The PEDI is routinely used in the physical therapy, occupational therapy, and speech-language-hearing departments of many hospitals to generate numerical scores that reflect childrens functional change from inpatient admission to discharge. We will just use the Functional Skills Mobility Scale in our example. For use in the case study, we have determined that the MDC90 for the PEDI in an inpatient setting, populated largely by children with severe brain injuries, is 5.1 points.15 This is a value obtained by using a 90% CI, a standard deviation value of 15.4 for children seen at hospital admission on the PEDI Functional Skills Mobility Scale (0100 scale), and a test-retest reliability (ICC) value of .96, based on a previously published report16 and our own internal testing. Based on these results, the change score across the 2 time points (6.1 at admission, 35.9 at 3 weeks) exceeds the MDC90 value, and the change is not likely due to chance variation or random measurement error.
We have found the MDC useful for interpreting changes in a case report recently published in Physical Therapy describing the changes observed after a 26-week fitness intervention for children with disabilities.17 By using the MDC, we were able to identify reliable changes in function, strength, and walking efficiency in 6 of the 9 children following a twice weekly group strength and endurance training program.
| MDC Proportion |
|---|
|
|
|---|
The variability in individual responses highlights the fundamental problem of summarizing treatment effects as a difference in means. In this example, 50% of the children achieved a positive change in knee extensor force production that exceeded the MDC90 value of 1.8 kg. A further subgroup analysis also can be conducted using the MDC, which indicated that, of the children who exceeded the MDC90 value, 59% were from the developmental disability group (children with intellectual disabilities, pervasive developmental disorders, or genetic disorders with intellectual or behavioral components) and only 28.6% were from the neuromuscular group (children with cerebral palsy, Duchenne muscular dystrophy, or traumatic brain injury). Reporting the proportion of patients achieving a degree of improvement that is beyond measurement error is a more informative method for describing the effects of the intervention than overall mean change.
In the case example at the beginning of this article, change between the 2 administrations of the PEDI exceeded the MDC90 value for the PEDI used in the inpatient setting. Some would argue, however, that, although the change noted is likely not due to measurement error, the MDC by itself does not provide us with an answer as to whether the change is clinically significant. (We will explore anchor-based indexes of change to address this concern later in the article.)
For most applications of the MDC, we assume that the amount of measurement error is constant along the entire functional scale. If one does not want to accept this assumption, Stratford et al19 provided a solution by demonstrating the usefulness of the conditional SEM with a common measure of physical disability. The MDC is based on a summary score metric; little to no attention is given to the pattern of changes at the item level with the MDC. The inability to take into account changes in responses to individual items is a limitation of the classical test theory approach, which is the basis of the MDC calculation. In summary, because the MDC is one of a family of distribution-based methods, it is easy to generate (because it requires no additional data collection) and can serve as an important adjunct for estimating reliable change in a wide variety of tests and measures used routinely by physical therapists in clinical practice. It is somewhat limited in its interpretation, however, because it assumes that detectable changes are uniform at any point along the scale. In contrast, as is discussed below, we will see that measurement error will vary at different points along the scale. A strong advantage of IRT is that standard errors can be calculated at each point along the scale, and as will be highlighted in the case discussion, these standard errors are usually larger at the score extremes and smaller in the middle of the scale.
| Item Response Theory Maps |
|---|
|
|
|---|
These probability estimates are used to determine an individuals most likely position along the scale. When assumptions of a particular IRT model are met, estimates of a persons ability do not strictly depend on a particular fixed set of items. This scaling feature allows one to compare people along a functional dimension even if they have not completed identical sets of items. Because items and scores are defined on the same scale, items can be optimally selected to provide good estimates of the domain at any level of the scale. This feature of IRT creates important flexibility in administering tests in a dynamic and tailored approach for each individual. Hambleton22 provided a more detailed explanation of IRT methods. Item response theory is currently being applied in physical therapy research to develop new measures, improve existing measures, investigate group differences in item and scale functioning, equate different instruments, and, as we highlight, develop better approaches to understanding the meaning of differences in scores. Jette and Haley23 and Ware and colleagues24 reviewed recent applications of IRT to rehabilitation tests and measures.
In order to better interpret change in an individual patient, most physical therapists have an interest in the types of items that make up a total score on the measure. Using a one-parameter IRT model in its simplest form, in which item difficulty is used to locate dichotomous items along a scale, the clinician can examine the test from the perspective of a hierarchic set of items that serves as a representation of an underlying variable.25,26 Item response theory procedures take full advantage of modeling of individual items; therefore, one can examine changes in item responses from serial assessments at an item level.
The PEDI scoring profile and summary scores are based on Rasch IRT27 measurement technology. This approach provides an important hierarchical framework in which the construct validity and clinical utility of summary scores can be determined. A hierarchic scale defines a set of sequential tasks that represent increasingly more difficult functional items along a single dimension. The scales of the PEDI were specifically constructed to meet the objective of forming independent hierarchic dimensions. Each scale can be used to identify which functional items are relatively easy or more difficult for a child to achieve.
In the Figure
, we have constructed an "item map" for the PEDI Functional Skills Mobility Scale corresponding to the case example, which allows us to define the specific items for which the child has shown capability and the items that he has yet to master. Because a child is expected to move along the continuum of hierarchically defined items, a summary score provides a clear indication of the childs performance level in that content domain, thus leading to an unambiguous interpretation of a summary score. Knowledge of specific content and location of items along the Functional Skills Mobility Scale can contribute to a richer understanding of the nature of mobility skill development and the interpretation of individual scores. For illustrative purposes in the case, we have arranged the entire set of items into 2 subsets: Transfers and Locomotion.
|
The potential utility of using item maps to track progress is to examine the specific item changes that are occurring during a physical therapy intervention program. This can be used to understand summary score changes, provide information to physical therapists about the pattern of skill changes, and perhaps suggest new items that might be the focus of revised patient goals. As IRT models of tests and measures used by physical therapists become more complex, such as tests using more than 1 parameter for estimation and response scales with more than 2 response choices (polychotomous), the item maps will become more complicated, but should still be informative. In addition, with the emerging use of computer-adapted testing applying the complex IRT models,2830 it will be imperative that considerable thought goes into the development of computer-generated item maps in order to help clinicians interpret the summary scores from an item response level for an individual patient. Some work of this kind has been completed in educational applications,31 but has yet to be fully adapted to the test and measures used by physical therapists.
| Minimal Important Difference |
|---|
|
|
|---|
What are some possible anchors that would help us understand scores or score change on a test and measure? Anchors might include self-reported opinions of individuals, including patients, family members, clinicians, or uninvolved judges. They often are collected by asking respondents to rate the amount of change in a particular area of health or function that has occurred during an episode of care. For example, if Marios parents were asked to rate how much change had occurred during the current hospital episode, we would expect that the parents would identify that Mario has made a noteworthy change in function. We discuss some advantages and limitations of this approach below. Anchors also might include more objective indicators, such as laboratory values or disease markers. Other anchors may include return to expected recovery events, such as walking, wheelchair mobility, sports activity, work, school, independence in home, safety, or other important life activities or roles.
One of most apparently obvious, but controversial, approaches to understanding change scores is to get information from the patient regarding his or her perception of change. For certain content areas, such as functional gain, pain, fatigue, quality of life, and others, the patient appears to be a good selection to provide a global anchor for measures, even though this reasoning is fraught with a certain element of circularity. One of the limitations of anchor-based methods that rely on global ratings from a patient (eg, how much have you improved during your physical therapy treatment episode?) is that these retrospective ratings, particularly those focusing on an extended period of time, are susceptible to recall bias. For patients who are followed over long periods of time, longitudinal anchor-based methods are preferable to cross-sectional methods because the former are more temporally linked with change.32 In addition, global change questions often have unknown reliability and validity.3
Clinicians also may be appropriate candidates to provide an external assessment of patient change, although without proper training and rigor in making judgments about change, large variations may occur. Iyer et al33 recently reported an anchor-based study to determine the MID (also called "minimal clinically important difference" [MCID]) in the PEDI scales using physical therapists and others in an inpatient pediatric rehabilitation hospital as external anchors. An important difference is described as a "clinically important" change in patient function that is perceived as beneficial and that would change the patients management.1 The "minimal important difference" is the smallest change in what is measured that is considered to be worthwhile or important to a patient.34
In the clinician-based anchor study by Iyer et al,33 the authors provided significant training and evaluated clinicians performance on case examples before recording their global judgments of patient change. They asked therapists to "indicate how much this child changed from admission to discharge in capability to perform mobility skills (that were important to home/community functioning)." The therapists used a 15-point Likert scale and a visual analog scale to indicate how much better (or worse) the child was at discharge than upon admission. The authors collapsed the original clinician rankings into 4 categories (worse/no change, minimal change, moderate change, and large change). The minimal change category included original Likert scale points of "somewhat better" and "a little better." The average change in PEDI mobility scores for the group of children in the minimal change category was 8.7 points. Thus, using clinicians as an anchor for describing changes during inpatient rehabilitation programs, Iyer and colleagues defined a change of 8.7 points as representing a clinically meaningful level of change. In contrast, children who were identified as having moderate change on average had an admission-to-discharge change of 28.4 points, and those who were classified as making large changes had an average change of 58.7 points on the PEDI Functional Skills Mobility Scale. In the case of children who are admitted to an inpatient rehabilitation program, most children are at a very low functional level when admitted, and thus do not change in the negative direction often. For many other acquired or progressive conditions, however, an analysis of change both in the positive and negative direction is warranted.
In our case example, using this anchor-based criterion, the child has changed from an admission score of 6.1 to 35.9 points (a difference of almost 30 points), and he has certainly exceeded an MID so far in his episode of care. He has even reached a point where one might consider his change to be more at the moderate level. Although this information is helpful in interpretation of the summary scores for this case, there are some important caveats to consider when using MIDs.
Minimal important differences have been shown to vary across patients and patient groups and to have limited generalizability.35 Different MID values may be obtained by using alternate anchors and methods; therefore, corroborating results across methods and multiple anchors will be important in future research. Furthermore, any estimate of the MID will be associated with a degree of uncertainty and variation. In the study by Iyer et al,33 although they report mean MIDs, there is considerable variability within each of the global change categories, so that reporting a range of MIDs might be preferable. Another limitation in the determination of an MID is the effect of initial placement of the patient on the scale. Patients who have very low initial scores at baseline (or admission) may have a greater ability to achieve an MID than those who start at a higher level on the scale. An additional concern is estimating an MID in certain groups with expected loss in function. Many studies simply use the absolute value of change scores, rather than separately evaluating improvement and deterioration.36
| MID Proportion |
|---|
|
|
|---|
| Combining Distribution- and Anchor-Based Methods |
|---|
|
|
|---|
In our case example, we add to the interpretation of the childs change scores by using the following empirical information collected in previous studies of children with traumatic brain injuries admitted to the inpatient rehabilitation program. An amount of change that is not likely to be measurement error at a 90% CI (MDC90) is 5.1 or at a 95% CI (MDC95) is 6.0. The MID, based on a clinician anchor, is 8.7 points.33 These values are in the range of what might be expected from approaches that have defined one half of a standard deviation39,40 (7.7 in the case of the PEDI Functional Skills Mobility Scale) to identify CSD. From this collective work, we might consider the changes seen in the case example to have met requirements for a CSD to be in the range of 5 to 9 points on the PEDI Functional Skills Mobility Scale.
An additional set of information regarding the kinds of items that are changing is provided by the item map. Combining total score information within the content of MDC and MID values, and inspecting patterns of item changes using IRT methods, may yield the most informative data for physical therapists who are attempting to use tests and measures for the examination of individual patients.
| Recommendations |
|---|
|
|
|---|
| Footnotes |
|---|
This work was supported by an Independent Scientist Award (K02 HD45354-01) to Dr Haley and grant R01 HD043568 (Haley, principal investigator) from the National Institute of Child Health and Human Development (NICHD) and the Agency for Healthcare Research and Quality (AHRQ). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH or AHRQ. This manuscript also was partially supported by the Noonan Foundation (Fragala-Pinkham, principal investigator). Dr Haley has stock interest in CRE Care, LLC, which distributes the Pediatric Evaluation of Disability (PEDI) products.
This article is based on a lecture by Dr Haley at the III STEP Symposium on Translating Evidence Into Practice: Linking Movement Science and Intervention; July 19, 2005; Salt Lake City, Utah.
Physical Therapy acknowledges that the authors retain the right to provide a copy of the final article to NIH upon acceptance for publication or thereafter, for public archiving in PubMed Central as soon as possible after publication by the Journal.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. M Jette Invited Commentary Physical Therapy, July 1, 2008; 88(7): 851 - 853. [Full Text] [PDF] |
||||
![]() |
T. Steffen and M. Seney Test-Retest Reliability and Minimal Detectable Change on Balance and Ambulation Tests, the 36-Item Short-Form Health Survey, and the Unified Parkinson Disease Rating Scale in People With Parkinsonism Physical Therapy, June 1, 2008; 88(6): 733 - 746. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M Wagner, J. A Rhodes, and C. Patten Reproducibility and Minimal Detectable Change of Three-Dimensional Kinematic Analysis of Reaching Tasks in People With Hemiparesis After Stroke Physical Therapy, May 1, 2008; 88(5): 652 - 663. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Paltamaa, T. Sarasoja, E. Leskinen, J. Wikstrom, and E. Malkia Measuring Deterioration in International Classification of Functioning Domains of People With Multiple Sclerosis Who Are Ambulatory Physical Therapy, February 1, 2008; 88(2): 176 - 190. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-P. Chen, L.-J. Kang, T.-Y. Chuang, J.-L. Doong, S.-J. Lee, M.-W. Tsai, S.-F. Jeng, and W.-H. Sung Use of Virtual Reality to Improve Upper-Extremity Control in Children With Cerebral Palsy: A Single-Subject Design Physical Therapy, November 1, 2007; 87(11): 1441 - 1457. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. D Allen Author Response Physical Therapy, July 1, 2007; 87(7): 930 - 934. [Full Text] [PDF] |
||||
![]() |
D. D Allen Responsiveness of the Movement Ability Measure: A Self-Report Instrument Proposed for Assessing the Effectiveness of Physical Therapy Intervention Physical Therapy, July 1, 2007; 87(7): 917 - 924. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M Jette, S. M Haley, W. Tao, P. Ni, R. Moed, D. Meyers, and M. Zurek Prospective Evaluation of the AM-PAC-CAT in Outpatient Rehabilitation Settings Physical Therapy, April 1, 2007; 87(4): 385 - 398. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Falla, G. Jull, T. Russell, B. Vicenzino, and P. Hodges Effect of Neck Exercise on Sitting Posture in Patients With Chronic Neck Pain Physical Therapy, April 1, 2007; 87(4): 408 - 417. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |