PTJ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


PHYS THER
Vol. 87, No. 4, April 2007, pp. 385-398
DOI: 10.2522/ptj.20060121

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Correction (v87,p957)
Right arrow Correction (v87,p617)
Right arrow Correction (v87,p617)
Right arrow All Versions of this Article:
ptj.20060121v1
87/4/385    most recent
Right arrow Submit a response
Right arrow Read responses to this article
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jette, A. M
Right arrow Articles by Zurek, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jette, A. M
Right arrow Articles by Zurek, M.

Research Reports

Prospective Evaluation of the AM-PAC-CAT in Outpatient Rehabilitation Settings

Alan M Jette, Stephen M Haley, Wei Tao, Pengsheng Ni, Richard Moed, Doug Meyers and Matthew Zurek

AM Jette, PT, PhD, is Director, Health and Disability Research Institute, School of Public Health, Boston University, 580 Harrison Ave, 4th Floor, Boston, MA 02218 (USA)
SM Haley, PT, PhD, is Associate Director, Health and Disability Research Institute, School of Public Health, Boston University
W Tao, BS, is Graduate Research Associate, Health and Disability Research Institute, School of Public Health, Boston University
P Ni, MD, MPH, is Research Assistant Professor, Health and Disability Research Institute, School of Public Health, Boston University
R Moed, MBA, is President, CRE Care, LLC, Boston, Mass
D Meyers, MBA, is National Director of Trends and Outcomes, HealthSouth Outpatient Services, HealthSouth Corporation, Birmingham, Ala
M Zurek, PT, is Vice President of Clinical Quality, HealthSouth Outpatient Services, HealthSouth Corporation

Address all correspondence to Dr Jette at: ajette{at}bu.edu


Submitted April 24, 2006; Accepted November 29, 2006


    Abstract
 
Background and Purpose: The purpose of this study was to prospectively evaluate the practical and psychometric adequacy of the Activity Measure for Post-Acute Care (AM-PAC) "item bank" and computerized adaptive testing (CAT) assessment platform (AM-PAC-CAT) when applied within orthopedic outpatient physical therapy settings.

Method: This was a prospective study with a convenience sample of 1,815 patients with spine, lower-extremity, or upper-extremity impairments who received outpatient physical therapy in 1 of 20 outpatient clinics across 5 states. The authors conducted an evaluation of the number of items used and amount of time needed to complete the CAT assessment; evaluation of breadth of content coverage, item exposure rate, and test precision; as well as an assessment of the validity and sensitivity to change of the score estimates.

Results: Overall, the AM-PAC-CAT's Basic Mobility scale demonstrated excellent psychometric properties while the Daily Activity scale demonstrated less adequate psychometric properties when applied in this outpatient sample. The mean length of time to complete the Basic Mobility scale was 1.9 minutes, using, on average, 6.6 items per CAT session, and the mean length of time to complete the Daily Activity scale was 1.01 minutes, using on average, 6.8 items.

Background and Conclusion: Overall, the findings are encouraging, yet they do reveal several areas where the AM-PAC-CAT scales can be improved to best suit the needs of patients who are receiving outpatient orthopedic physical therapy of the type included in this study.


    Introduction
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion and Conclusions
 References
 
Computerized adaptive testing (CAT), an outcome measurement approach for comprehensive and precise assessment of patient-related outcomes, is being used with increasing frequency in the health care field.13 This method of patient assessment uses a computer to administer test items to patients and is adaptive in the sense that each "test" is tailored to the unique level of each patient. Each person who takes an adaptive test is taking a different version of the test because the items are administered on the basis of the patient's previous responses. By avoiding the administration of a large number of questionnaire items, by selecting only those questions from a large "item bank" that provide the maximum amount of information based on a person's responses to previous questions, CAT approaches allow for the rapid collection of accurate outcome information that can feasibly be implemented in busy clinical settings as well as in research settings.4

A CAT is programmed to first present an item from the mid-range of a predefined item bank of outcome questions and then directs subsequent questions to the patient's most appropriate level based on his or her previous responses. By having comprehensive item banks available for each outcome domain of interest, the CAT algorithm selects only the items that are needed to provide a score estimate based on a predetermined number of items or a predetermined level of measurement precision. This allows for fewer items to be administered to each patient while gaining accurate information regarding an individual's placement along an outcome continuum.5 The development of comprehensive and methodologically sound item banks for each outcome of interest is a prerequisite to the development of psychometrically adequate CAT platforms that have clinical or research utility.

Item response theory (IRT) is both a theoretical framework and a collection of quantitative techniques used to construct outcome instruments, to scale responses to individual test items, and to equate scores, as well as to identify item bias and to facilitate CAT.3,6 With IRT, items are calibrated on the same scale that is used to measure a patient's functional ability. As such, the items are inherently linked to the scale both in terms of ability and the amount of information that an item provides at some point along the scale. In a CAT application, items are selected based on maximum information near the individual's estimated level of ability, thus avoiding the administration of items that are too easy or too difficult. This property of IRT supports an efficient selection of items during a CAT administration. In essence, the CAT software is programmed to select the items that provide optimal information, thus leading to a precise and efficient estimate of the patient's ability.7,8 This feature of CAT and IRT methods creates important flexibility in administering tests in a dynamic and tailored approach for each patient.

Although CAT applications for health care have been recommended for nearly a decade2,5,9 and a major set of papers on the subject was published in 2000,7,10,11 the literature has been limited largely to either position papers,4,1214 data simulations,1,1519 or small-scale prospective research demonstrations.17,18,20

If CAT applications are going to become widely accepted as a means of monitoring health care outcomes, prospective evaluations should become more readily available in the clinical literature. To our knowledge, no previous study has evaluated the prospective use of CAT in health care environments. Building on our previous work,2123 in this pilot study we prospectively evaluated the practical and psychometric adequacy of the Activity Measure for Post-Acute Care (AM-PAC) "item bank" and CAT assessment platform (AM-PAC-CAT) when applied within orthopedic outpatient physical therapy settings. Our evaluation consisted of 3 components: (1) a practical evaluation that included test efficiency of the CAT (ie, number of items used and amount of time needed to complete the CAT assessment); (2) a psychometric evaluation, including content range coverage, item exposure rate (IER), test precision, and person fit; and (3) an assessment of the validity and sensitivity to change of the score estimates derived by the AM-PAC-CAT.

In this study, we evaluated the Basic Mobility and Daily Activity scales of the AM-PAC-CAT.24 Our intent was to identify areas where the prototype AM-PAC-CAT instrument was working well and where the CAT could be improved to enhance its utility for use in outpatient rehabilitation and related clinical settings.


    Method
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion and Conclusions
 References
 
Instrument

The AM-PAC is an activity limitations instrument developed using the World Health Organization's International Classification of Functioning, Disability and Health (ICF).25 Within the ICF, an activity limitation is defined as "difficulty in the execution of a task or action by an individual."25 (p14) In developing the AM-PAC-CAT, we used 2 different samples, for a combined sample size of more 1,000 patients in post–acute care settings.17,22

We developed an initial pool of AM-PAC items based on input from measurement and content experts, suggestions from several focus groups of people with disabilities, and a comprehensive literature review. Some items were modified from existing functional instruments, but adapted for difficulty or assistance response categories included in the AM-PAC. We framed the activity questions in a general fashion without specific attribution to health, medical conditions, or disabling factors. The AM-PAC data are collected by self-report, either through self-administration or when administered either by a clinician or by a trained data collector.

The Daily Activity scale item bank encompasses 65 distinct personal care and instrumental activities of daily living tasks. The Basic Mobility domain contains 120 basic physical activities such as bending, walking, carrying, or climbing stairs. Based on factor analytic work and IRT analyses,21,23 Basic Mobility and Daily Activity scale domains were identified and confirmed. A third AM-PAC domain—applied cognitive activity—was not included because it was judged by the clinical sites participating in this study as not relevant to this patient population.

The IRT modeling of the items was conducted using the generalized partial credit model (GPCM).26 The GPCM uses 2 parameters—item difficulty and discrimination—in estimating item locations and person scores and makes no assumptions regarding the similarity of item response categories across items. Adequate levels of reliability of individual items and validity of the AM-PAC have been established and have been reported previously.21,27

We developed a CAT version of the AM-PAC instrument (the prototype AM-PAC-CAT instrument) and have conducted a preliminary evaluation in samples of patients in post–acute care settings.24 The CAT software includes options for item selection, score estimation using the expected a posteriori (EAP) estimator method, and stopping rules based on the number of items or level of precision.

In this study, we set a stop rule of administering no more than 7 items to each patient based on the participating clinic's desire to keep patient (and clinic staff) burden to an absolute minimum. We also used a content balancing algorithm that allowed AM-PAC items to be selected based on both content specifications and maximum information function for the first 3 items of the Basic Mobility scale and the first 4 items of the Daily Activity scale.28

The content balancing algorithm ensured that content chosen within the CAT item selection procedure was not limited to only one content aspect of the scale. For example, the first 3 items of the Basic Mobility scale were an item from each of the 3 major content areas: (1) bend/lift/reach/carry/ items, (2) mobility items, and (3) transfer items. Likewise, the CAT was programmed to select an item with the most information from one of each of the 4 Daily Activity scale content areas: (1) dressing items, (2) meal items, (3) grooming and hygiene items, and (4) instrumental activity items. Subsequent items in both scales then were selected based on maximum information at each iterative step.

Estimated AM-PAC scores for each subject in the sample were converted to norm-based scoring, which is a simple linear translation that expresses scores as deviations from a measure of central tendency. In this study, we used a mean of 50 and a standard deviation of 10. By using norm-based scoring instead of the more traditional 0–100 scale, as we raise the ceiling or lower the floor of a scale in the future by adding and calibrating new items, the placement (and scoring) of the item thresholds in relation to the average does not change. We based the CAT algorithms used in this study on software developed at the Health and Disability Research Institute, Boston University.

Subjects

Subjects for this study, conducted in 2005, consisted of a convenience sample of 1,815 patients with spine, lower-extremity (LE), or upperextremity (UE) impairments who received outpatient physical therapy in 1 of 20 outpatient clinics across 5 states that were operated by HealthSouth's Outpatient Division Inc.

Background characteristics of the study sample, by major impairment grouping, are shown in Table 1. The sample was predominantly female, with a mean age between 46.8 and 51.4 years.


View this table:
[in this window]
[in a new window]

 
Table 1. Demographic Characteristics of the Study Sample, by Impairment Groupa

 
Data Collection

On their initial and discharge visits for physical therapy, subjects completed the self-report AM-PAC-CAT on a tablet computer provided to them in the clinic waiting room prior to their physical therapy visit. An office staff member was available to the subjects during the administration process to answer any questions. The 1,815 subjects included in this analysis completed both admission and discharge AM-PAC-CATs.

Subject demographic information, acuity level, surgical status, and major impairment were all available from administrative data collected routinely by each outpatient clinic. Reliability and validity data on these administrative data elements were not available. Acuity was defined as the number of days from the onset of the condition for which therapy was being sought to the admission visit to the physical therapy clinic. Payer source was defined as the primary source of payment for that physical therapy episode of care. Spine impairments included impairments of the cervical, thoracic, or lumbosacral region of the spine. Upper-extremity impairments included conditions of the shoulder, elbow, hand, or wrist. Lower-extremity impairments were conditions of the hip, knee, foot, or ankle.

Data Analysis

To evaluate the practical utility of the AM-PAC-CAT, we assessed the CAT's efficiency, which was defined as the number of CAT items administered per assessment and the amount of time taken to complete the CAT. In the psychometric evaluation, we assessed the content range of each scale item pool, IER, test precision, and model fit in this sample. We also evaluated scale score validity and sensitivity to change over the episode of care.

Content range coverage assessed how well the AM-PAC item bank captured the range of physical functioning experienced by the subjects in each Activity Limitation scale content domain. We examined potential ceiling effects (ie, the point at which subjects received the highest score) and floor effects (ie, the point at which subjects received the lowest possible score).

The IER identified which AM-PAC items were administered more often in the CAT application. Item exposure rate was defined as the ratio of the total number of times an item was administered over the total number of test occasions in a CAT study. Plots of the IER against item difficulty levels were constructed to detect possible relationships between frequencies of items being selected and their difficulty levels. The IER is influenced by the difficulty and discrimination of items, the distribution of ability of the patients, what other similar items are in the item bank, and the specific content balancing specifications developed for each scale.29,30

Test precision was examined in this sample using the test information function (TIF). The TIF is a summary of information provided by individual items in the instrument and identifies where along an underlying scale that items have their best level of discrimination and measurement precision. Although the ideal for a CAT instrument is equal measurement precision (small standard errors of measurement) at all levels of ability, there is likely to be some variability of measurement precision for a certain group of people depending on their level of ability on the scale. The location on the scale where the test information curve peaked indicates the portion on an ability scale best measured by that instrument. When the test information is peaked at around the same range on the scale as the patients' peak of ability distribution, the instrument is assumed to "fit" the population being measured.

Test information function values are closely related to the calculation of standard error (SE) of the person ability estimates. Specifically, the SE of the person ability estimate is inversely proportional to the TIF value: SE=1/square root(TIF). To illustrate the precision levels of CAT scores at different ability levels, we also calculated the average SE of estimates for people at different score ranges. Confidence intervals (CIs) of the estimates were generated by multiplying the SE by a z score corresponding to certain confidence level.

To assess sample fit to the CAT model, we estimated the degree to which the subjects' responses to items met the hierarchical assumptions of the fixed calibrations used in the CAT for the Basic Mobility and Daily Activity scales. For any IRT scale, an important assumption is that item difficulty locations on the underlying functional scale are similar for all people and that these locations have a predetermined hierarchy that applies to most individuals. To test this assumption, we used a standardized log-likelihood statistic (lz) for polytomous items to test for person fit.31 The empirical distribution of the log-likelihood statistic is reasonably close to a standardized normal distribution, so we calculated the percentage of administrations (both at admission and discharge) in which lz exceeded an alpha level of .05.

Validity of CAT score estimates was assessed using construct validation techniques. To provide evidence for construct validity of the AM-PAC-CAT scales, we compared AM-PAC-CAT scores between subjects with less than 35 acuity days (the median) and subjects with more than 35 acuity days and between subjects who had postsurgery treatment and those who had not. We hypothesized that earlier treatment following the onset of a condition and treatment after surgery would be associated with more improvement on both AM-PAC outcome scales.

Sensitivity to change was examined using one-sample dependent t tests to determine whether the increase in AM-PAC-CAT scores between admission and discharge from therapy were significantly greater than zero. In addition, we calculated the minimal detectable change (MDC) and the MDC proportion. The MDC is considered the minimal amount of change that is not likely to be due to measurement error. It is one of the more common distributional-based change indexes, which can be used to identify reliable changes in function, strength (force-generating capacity), and walking efficiency.32 The MDC can be reported at different confidence levels. We chose to report both the MDC68 and MDC90 confidence levels in this article. The MDC proportion was calculated as the proportion of people scored equal to or above MDC. In calculating the MDC, we used test-retest reliability estimates on the short-form AM-PAC from our earlier work.27


    Results
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion and Conclusions
 References
 
The mean length of time to complete the Basic Mobility scale was 1.9 minutes, using, on average, 6.6 items per CAT session from the Basic Mobility scale item pool. The mean length of time to complete the Daily Activity scale was 1.01 minutes, using on average, 6.8 items from the Daily Activity scale item pool. The percentages of cases using the maximum number of items (7 items) allowed in this application were: 72% for the Basic Mobility scale and 87% for the Daily Activity scale.

Content Coverage

The mean Basic Mobility scale scores at the admission and discharge visits, for the total sample and by impairment group, are listed in Table 2. The mean admission score of the total group was 63.3, and the mean discharge score of the total group was 68.7, an average increase of 5.4 units. When broken down into the 3 impairment groups, the UE group had the highest Basic Mobility scale scores and the LE group had the lowest scores in both admission and discharge sessions.


View this table:
[in this window]
[in a new window]

 
Table 2. Scale Distributions and Sensitivity to Change in the AM-PAC-CAT Basic Mobility Scale, by Impairment Groupa

 
There was neither a ceiling effect (highest possible AM-PAC score estimate for a subject) nor a floor effect (lowest possible AM-PAC score estimate for a subject) in the admission Basic Mobility scale, but on discharge, 10% of the total sample achieved the highest possible score. Ceiling effects were the greatest for the UE group, where 12.7% scored the highest value at discharge. Figure 1 shows that the admission scores were roughly normally distributed for each impairment group. However, in the discharge session (Fig. 1), the Basic Mobility scale scores were negatively skewed, illustrating some ceiling effect.


Figure 1
View larger version (12K):
[in this window]
[in a new window]

 
Figure 1. AM-PAC-CAT Basic Mobility scale score distribution at admission and discharge.

 
The mean Daily Activity scale scores at the admission and discharge visits, for the total sample and by impairment group, are shown in Table 3. The mean Daily Activity scale admission score of the total group was 57.0, and the mean discharge score of the total group was 60.9, an increase of 3.9 units. When broken down into the 3 impairment groups, the LE group had the highest Daily Activity scale scores and the UE group had the lowest scores in both admission and discharge sessions. There was no floor effect at either visit, but there were substantial ceiling effects. A greater proportion of subjects scored at 65.3, very close to the maximum possible score of 67. Therefore, for this scale, we expanded the definition of ceiling effect to contain the score range from 65.3 to the maximum. In the admission session, 25% of the total sample displayed a ceiling effect on the Daily Activity scale. The greatest ceiling effect was seen for the LE group on admission, where 32.3% of the subjects scored at the ceiling on this scale. In the discharge session, almost half of the subjects scored at the ceiling on the Daily Activity scale, with the greatest ceiling effect (62.6%) seen for those subjects with an LE impairment. The frequency distributions presented in Figure 2 illustrate the negatively skewed distributions for the subjects at admission and at discharge.


View this table:
[in this window]
[in a new window]

 
Table 3. Scale Distributions and Sensitivity to Change in the AM-PAC-CAT Daily Activity Scale, by Impairment Groupa

 

Figure 2
View larger version (10K):
[in this window]
[in a new window]

 
Figure 2. AM-PAC-CAT Daily Activity scale score distribution at admission and discharge.

 
Item Exposure Rate

In the Basic Mobility scale item pool, one item ("Bending over to pick up something") was administered at every test occasion (IER=100%) because it was the predetermined starting rule. Eighteen items (15%) were not administered, and 81 items (67.5%) were exposed below 5% of the time. Table 4 displays the 21 Basic Mobility scale items that achieved an IER greater than 5% in the total sample across admission and discharge administrations.


View this table:
[in this window]
[in a new window]

 
Table 4. Most Frequently Used Functional Tasks From the AM-PAC-CAT Basic Mobility Scale Item Bank

 
In the Daily Activity scale item pool, 2 items were highly used (with an IER between 90% and 100%). All 65 items in the pool were used in this study, and a majority of the items (76.9%) had an IER below 5%. The 15 items with an IER greater than 5% are shown in Table 5.


View this table:
[in this window]
[in a new window]

 
Table 5. Most Frequently Used Functional Tasks From the AM-PAC-CAT Daily Activity Scale Item Bank

 
Figures 3 and 4 provide a chart of IER for the Basic Mobility and Daily Activity scale item pools by item difficulty level for the total sample. For the Basic Mobility scale item pool, although items on the upper half of the scale were exposed more often than items on the lower half of the scale, the distribution of the higher IER items was roughly even across the upper half of the scale. In contrast, for the Daily Activity scale domain, the higher IER items were clustered within a smaller range at the higher end of the scale. This pattern reflects the ceiling effect of the whole item bank illustrated in previously described results.


Figure 3
View larger version (11K):
[in this window]
[in a new window]

 
Figure 3. AM-PAC-CAT Basic Mobility scale item exposure rate by average calibration.

 

Figure 4
View larger version (11K):
[in this window]
[in a new window]

 
Figure 4. AM-PAC-CAT Daily Activity scale item exposure rate by average calibration.

 
Test Precision

Figures 5 and 6 contrast the TIFs for the full set of items and the TIFs for the items selected most often by the CAT (across both admission and discharge sessions). A higher level of information indicates greater measurement precision at that point along the scale. For the Basic Mobility scale domain, the TIF curve for the entire test pool peaked around 55 units on the ability scale, and the TIF curve for the 16 most frequently exposed items in CAT administrations shifted somewhat to the right and peaked at around 60. In the Daily Activity scale domain, the full item bank TIF curve peaked at around 40 on the ability scale. The TIF for the items administered most frequently by the CAT peaked at around 45. As expected from the distribution of scores, the most frequently used CAT items had optimal precision at higher levels of daily activity functioning than the overall TIF for the full item bank.


Figure 5
View larger version (6K):
[in this window]
[in a new window]

 
Figure 5. Activity Measure for Post–Acute Care Basic Mobility scale information curve.

 

Figure 6
View larger version (7K):
[in this window]
[in a new window]

 
Figure 6. AM-PAC-CAT Daily Activity scale information curve.

 
Table 6 presents the SE of estimates for subjects at different ability levels, which were calculated after combining the admission and discharge sessions for each scale. As shown in the table, for the Basic Mobility scale, scores between 50 and 69 were estimated the most precisely (SE=1.99). As the ability level moved farther away from this range, the precision level decreased. This table also presents the 95% CI for each score range by multiplying the average SE of the estimate by 1.96 (Z0.95=1.96). For example, the average SE of the estimate for a Basic Mobility scale score between 30 and 49 is 2.16 points and the 95% confidence width is ±4.23 (1.96x2.16). Therefore, if a person scores 35, we are 95% confident that the true ability level of this person is between 30.77 (35–4.23) and 39.23 (35+ 4.23). For the Daily Activity scale, due to ceiling effect, no subject scored above 70, thus no SE is available for this range. The most precisely estimated score range is between 30 and 49, with SE equal to 1.98 points, and the least precisely estimated score range is between 50 and 69, with SE equal to 5.32 points.


View this table:
[in this window]
[in a new window]

 
Table 6. Average Standard Error (SE) at Different Ability Score Range for AM-PAC-CAT Basic Mobility and Daily Activity Scalesa

 
Model Fit

Person score misfit occurs when a person answers an item or items in a very unexpected way, given the estimate of functional ability from other item responses. Using the log-likelihood test, a misfitting item profile was detected in only 3% of the Basic Mobility scale test administrations and in only 2% of the Daily Activity scale test administrations.

Construct Validity

If both AM-PAC scales discriminated properly, we expected to see greater increases in basic mobility and daily activity for those subjects who were below the median level of acuity compared with those who were above the median level and for those who received postsurgical treatment compared with those who did not receive postsurgical treatment. As hypothesized, the data presented in Table 7 revealed that there were statistically significant differences in level of improvement in the Basic Mobility scale as a function of a subject's acuity level and his or her surgical status. The Daily Activity scale discriminated across acuity subgroups, but the difference was not statistically significant for the surgical status subgroups.


View this table:
[in this window]
[in a new window]

 
Table 7. Difference Scores (Mean±SD) for AM-PAC-CAT Basic Mobility and Daily Activity Scales, by Acuity and Postsurgical Treatment Groups

 
Sensitivity to Change

The sensitivity to change between admission and discharge visits of the Basic Mobility and Daily Activity scales is shown in Tables 2 and 3. The Basic Mobility and Daily Activity scales detected statistically significant mean score increases for the total sample and by each impairment group, with moderate to large effect sizes and standardized response means. Effect sizes for the Basic Mobility scale ranged from 0.34 for UE impairments to 0.91 for LE impairments. For the Daily Activity scale, the range was from 0.42 for spine impairments to 0.60 for UE impairments.

Among the 3 impairment groups, the LE impairment group experienced the highest Basic Mobility scale mean score increase (8.32 units), followed by the spine impairment group (4.83 units) and then by the UE impairment group (2.78 units). The Daily Activity scale also detected significant mean score increases for the total sample (3.9 units) and by each impairment group. Among the impairment groups, the UE impairment group experienced the highest mean Daily Activity scale score increase (5.65 units), followed by the LE impairment group (3.69 units) and then the spine impairment group (2.89 units).

For the Basic Mobility scale, 60% of the patient episodes exceeded the MDC68 and 49% exceeded the MDC90. For the Daily Activity scale, 50% of the patient episodes exceeded the MDC68 and 42% exceeded the MDC90. The proportion of patients who exceeded the MDC varied across impairment groups is shown in Tables 2 and 3.


    Discussion and Conclusions
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion and Conclusions
 References
 
The CAT outcome instruments are intuitively appealing for use as quality monitoring tools within and across various clinical settings due to their promise of reducing respondent burden and minimizing data collection costs without sacrificing their psychometric properties. The findings from this study provide initial prospective evidence that CAT instruments can deliver on this promise. The 2 AM-PAC-CAT scales used in this study used, on average, 6 to 7 items per scale to estimate AM-PAC scores for the 120-item Basic Mobility scale and the 65-item Daily Activity scale. These 2 AM-PAC-CAT scales were completed, on average, in under 2 minutes, making them practical to use in busy clinical settings. However, to be truly useful in tracking functional outcomes for the purpose of quality monitoring, CAT scales must meet several psychometric standards as well.

The present study is the first attempt, to our knowledge, to prospectively evaluate the psychometric utility of 2 CAT-based Activity Limitation outcome scales within an actual clinical setting for the purpose of monitoring functional outcomes. This evaluation included an assessment of the scale distributions and content coverage, particularly their ceiling and floor effects; CAT selection of items from the underlying item pool; precision of the test; and construct validity and sensitivity to change, along with an examination of how well the item banks fit this sample of patients receiving outpatient orthopedic physical therapy services. Overall, the findings are encouraging, yet they do suggest areas for improvement that would advance measurement in this sample.

The AM-PAC-CAT's Basic Mobility scale demonstrated excellent psychometric properties when applied in this outpatient rehabilitation sample. The frequency distributions were roughly normally distributed, with no floor effects and only modest ceiling effects (10% at discharge). Although the Basic Mobility scale was sensitive to change in all 3 impairment groups, it was most sensitive to change among those subjects with primary LE and spinal impairments. The effect size level for the Basic Mobility scale was 0.34 for subjects with UE impairments, but an effect size level of 0.91 was achieved for subjects with LE impairments. The greatest proportion of subjects exceeding the Basic Mobility scale MDC (66.1%) was seen in those with LE impairments. The Basic Mobility scale worked least well for subjects with UE impairments, which makes clinical sense when one considers that people with UE conditions are less likely to experience mobility limitations in the types of activities measured by this scale.

The Basic Mobility scale also discriminated well among subjects as a function of their acuity level and their postsurgical status. Among a pool of 120 items, the CAT relied primarily on 21 Basic Mobility scale items. While the CAT relied most frequently on those items in the upper half of the Basic Mobility scale, the distribution of the higher-end items was roughly even across the upper half of the scale. Considering that the AM-PAC was designed for patients in post–acute care inpatient and outpatient settings, one would expect the items used in an outpatient sample to be selected from the upper half of the item bank. Based on the TIF of the most frequently used CAT items, analyses revealed that the greatest measurement precision occurred when a person's Basic Mobility scale score was between 50 and 60, with less precision being achieved above and below this range. It is clear that new items located at the upper (better functioning) end of the Basic Mobility scale could help reduce ceiling effects and improve measurement precision.

The AM-PAC-CAT's Daily Activity scale demonstrated less adequate psychometric properties than the Basic Mobility scale in this outpatient sample. Analyses revealed several areas where the Daily Activity scale is in need of revision and improvement to best suit the needs of people in orthopedic outpatient settings. The frequency distributions of the Daily Activity scale scores revealed the negatively skewed distributions for subjects in each impairment group on admission to and at discharge from physical therapy care. There was a substantial ceiling effect in the Daily Activity scale scores for all 3 impairment groups, especially in the LE impairment group where the ceiling was reached by 32.3% of the subjects on admission and by 62.6% of the subjects at discharge. The Daily Activity scale discriminated among subjects as a function of their acuity status and detected significant mean score increases in function for all 3 impairment groups.

Despite the shortcomings of the Daily Activity scale, the group effect sizes achieved were substantial: the range was from 0.42 for spine impairments to 0.60 for UE impairments. The Daily Activity scale was most sensitive to change among those subjects with UE impairments, which also makes clinical sense because UE impairments are the type of condition most likely to affect personal care and performance of instrumental activities of daily living. Among a pool of 65 Daily Activity scale items, the CAT relied primarily on 15 items, which were predominantly located in the upper end of the item pool. The TIF curve for the items most frequently selected by the CAT on the Daily Activity scale revealed that these items provided less information for the subjects with Daily Activity scale scores above 60 units. For improved measurement precision at higher levels of functioning, particularly for improving the precision of individual score estimates, these findings suggest that the Daily Activity scale needs revision and addition of new items to make the scale more useful for outpatients of the type seen in this study.

An important advantage of CAT methodology, in contrast to traditional fixed-form measurement approaches, is the ability to readily update and improve the item bank as well as the CAT algorithms as problems are identified. Based on the results of this study, our research group has developed new questionnaire items for the Basic Mobility and Daily Activity scale item banks and has tested them, along with the existing AM-PAC-CAT scales, within a new sample of outpatients receiving physical therapy services. We are currently examining these new items in an IRT analysis to determine whether they fit the Basic Mobility or Daily Activity outcome scales, have content advantages over current items, and have locations on these outcome scales that fill in the content gap identified in the current study. Once these new IRT analyses are completed, the new items will be incorporated into the next revision of the Basic Mobility and Daily Activity scale item bank and CAT programs. In this sense, CAT outcome instruments can be viewed as dynamic, with the potential for continuous updating and improvement.

One of the concerns over using CAT-based outcome instruments is whether restricting the number of items administered to a patient (a maximum of 7 items in this study) could diminish the sensitivity of the instrument to change. With regard to this issue, it was encouraging to note that the effect sizes observed using the AM-PAC-CAT were comparable to those observed in similar types of patients followed with more traditional fixed-form functional outcome tools. For example, in this study, we observed an effect size of 0.91 with the AM-PAC Basic Mobility scale when used with patients with LE impairments. This finding compares with an effect size of 0.94 at 4 weeks for the Activities of Daily Living Scale in subjects with knee impairments,33 an effect size of 0.93 for the Lysholm Knee Rating Scale, and an effect size of 0.81 that was observed on the Physical Function scale of the 36-Item Short-Form Health Status questionnaire (SF-36) when applied in a sample of outpatients with knee impairments.35 In our study, the average AM-PAC-CAT Basic Mobility scale effect size was 0.62 in subjects with spinal impairments, which compares with an effect size of 0.70 that was observed on the Physical Function scale of the SF-36 when applied in a sample of outpatients with cervical and lumbar impairments.36 Future studies will be directed at evaluating the extent to which adding more than the 7 items per scale may improve upon the levels of sensitivity observed in this study.

Limitations

There are several limitations to the pilot study that should be noted. The first is that the subjects were a convenience sample of outpatients drawn from 20 outpatient practices. As with any convenience sample, we have no way of determining the extent to which these subjects represent the populations served by these clinics.

The reader also should note that only those subjects who completed both admission and discharge AM-PAC-CATs were eligible for our analyses. Securing discharge CATs in these busy clinical practices was a problem. Although the sample for this paper consisted of only 38% of all subjects who had completed the admission AM-PAC-CAT, those subjects who completed an admission AM-PAC-CAT but not a discharge AM-PAC-CAT were very similar to those who completed both instruments. Subjects who completed only the admission AM-PAC-CAT versus subjects who completed both admission and discharge AM-PAC-CATs were slightly younger (mean age=48 years versus 50 years), were more likely to have a spinal impairment (36% versus 32%), and were less likely to be receiving postsurgical treatment (24% versus 28%). The mean Basic Mobility and Daily Activity scale scores for patients who completed only the admission AM-PAC-CAT were 62.7 and 56.5, not statistically different from the mean scores of 62.9 and 56.8 for the final sample. Finally, we used test-retest estimates from an earlier study of the AM-PAC that was done with both inpatients and outpatients who were receiving post–acute care.27 The ideal approach would have been to derive test-retest estimates from a sample of subjects from orthopedic outpatient settings. We were unable to do so in this study, so we used the estimates from our earlier work. These methodological limitations should be kept in mind when interpreting our findings.

Implications

We believe that contemporary measurement methods such as IRT and CAT methodology present an exciting innovation that has the potential to transform the way in which patient-based outcome assessments are conducted within and across health care settings. The National Institutes of Health, for example, has recently included CAT approaches as part of their Roadmap and has funded major multi-year CAT projects to develop clinical research applications14 designed to ensure more uniformity in outcome endpoints used for clinical trials. Because CAT assessments provide an accurate, real-time measurement of outcomes, the CATs can readily be used to track patient-reported outcomes to clinical interventions, making them attractive for use in quality-monitoring systems applied across various clinical settings.37

We believe that the advantages of CAT-based instruments are likely to be maximized when applied across various post–acute care settings where the breadth of the CAT-based instrument will be of maximum advantage. For instance, when used to monitor patient outcomes across inpatient rehabilitation, nursing home, and home health care settings, the sensitivity of the AM-PAC-CAT has been shown to be superior to traditional setting-specific instruments such as the Functional Independence Measure.38,39

Future CAT development should include work that attempts to balance the utility of generating scores for groups of patients (as was done in this study) with a desire by clinicians to use these CAT assessments as a source of usable information for individual treatment planning and specific patient monitoring. Past efforts that have tried to use group-level outcome assessment tools for individual patient assessment have largely been disappointing.40 The problem is that group-level instruments yield imprecise and insensitive scores for individual patients. This problem might be solved using CAT methodology.2

In theory, it is possible to generate CAT item selection algorithms that would select items to be administered to a patient from the relevant underlying item pool based on clinical considerations as well as on maximizing information for the CAT estimate. Computerized adaptive testing methodology allows the user to yield reliability estimates at the level of the individual person, thus facilitating the selection of a sufficient number of items for longitudinal assessment of individual change over time. An example of how this individual patient approach using a CAT version of the Pediatric Evaluation of Disability Inventory was recently published.32 A challenge to developing CAT applications that are useful at the individual patient level in rehabilitation is to provide sufficient information at the patient level while minimizing patient response burden so that CATs remain feasible to use in clinical practice.

If CAT outcome instruments such as the AM-PAC-CAT are shown to be beneficial for widespread application and use, a future challenge will be to develop effective and efficient methods to disseminate these innovations. It is essential not only that information about contemporary outcome instruments is communicated accurately and efficiently, but also that potential users understand what these instrument can offer and have the skill to appropriately implement them to assess functional outcomes. Without careful attention to dissemination and training, health care professionals may not know how to use these innovative tools and, consequently, outdated ordinal-scaled measures are likely to remain the outcome measurement norm for years to come.

To accomplish this challenge, new dissemination methods will need to be developed and implemented beyond the traditional methods of professional conference presentations and publication in scholarly journals.41 Funding mechanisms will need to be developed that will support these dissemination tasks at every level. Future users must be provided with the software needed to apply, analyze, and interpret CAT-based outcome instruments. This may require the development of continuing education seminars or high-quality technical assistance vehicles to assist rehabilitation professionals and organizations in their understanding, application, and interpretation of contemporary outcome measurement tools. Accreditation or professional organizations might be able to play a crucial role in this dissemination approach, facilitating the dissemination process.

In addition, efforts need to be taken to ensure that future generations of clinicians are appropriately trained through the development of didactic courses and professional curricula on contemporary outcomes measurement. Specific courses on modern measurement technology can be incorporated into professional curricula as a new basic science in professional (entry-level) health professions education. To accomplish this challenge will require efforts to educate faculty in the science of contemporary outcome measurement so that they have the skill to develop and deliver these courses to their future students. All of these dissemination steps are necessary to ensure that future generations of clinicians are familiar with and skilled in the application of contemporary outcomes measurement. Once developed and fully tested, these contemporary outcome instruments need to be widely disseminated and incorporated into clinical practice and research to improve our understanding of the effectiveness of health care interventions.


    Footnotes
 
D Jette and Dr Haley provided concept/idea/research design and writing. Mr Meyers and Mr Zurek provided data collection. D Jette, Dr Haley, Ms Tao, and Dr Ni provided data analysis. D Jette and Mr Moed provided project management. Dr Jette and Dr Haley provided fund procurement. Mr Zurek provided institutional liaisons. Mr Meyers and Mr Zurek provided subjects, facilities/equipment, and consultation (including review of manuscript before submission).

This study was approved by the Institutional Review Board of Boston University.

This study was supported by HealthSouth Corporation's Outpatient Division. It also was supported, in part, by an Independent Scientist Award (K02 HD45354-01) to Dr Haley.

Dr Jette, Dr Haley, and Mr Moed have stock interest in CRE Care, LLC, which distributes the Activity Measure for Post-Acute Care products.


    References
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion and Conclusions
 References
 

  1. Gardner W, Kelleher KJ, Pajer KA. Multidimensional adaptive testing for mental health problems in primary care. Med Care. 2002;40:812–823.[CrossRef][ISI][Medline]
  2. McHorney CA. Generic health measurement: past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med. 1997;127:743–750.[Abstract/Free Full Text]
  3. Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000;38:II28–II42.[Medline]
  4. Jette AM, Haley SM. Contemporary measurement techniques for rehabilitation outcomes assessment. J Rehabil Med. 2005;37:339–345.[CrossRef][ISI][Medline]
  5. Revicki DA, Cella DF. Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing. Qual Life Res. 1997;6:595–600.[CrossRef][ISI][Medline]
  6. Cook KF, Monahan PO, McHorney CA. Delicate balance between theory and practice: health status assessment and item response theory. Med Care. 2003;41:571–574.[CrossRef][ISI][Medline]
  7. Hambleton RK. Emergence of item response modeling in instrument development and data analysis. Med Care. 2000;38:II60–II65.[Medline]
  8. Embretson S, Reise S. Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum Associates; 2000.
  9. Bjorner JB, Ware JE Jr. Using modern psychometric methods to measure health outcomes. Medical Outcomes Trust Monitor. 1998;3(2):14–18.
  10. Cella D, Chang C-H. A discussion of item response theory and its applications in health status assessment. Med Care. 2000;38:66–72.[ISI]
  11. Ware JE Jr, Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies of widely used headache impact scales. Med Care. 2000;38:II73–II82.[Medline]
  12. Ware JE Jr. Conceptualization and measurement of health-related quality of life: comments on an evolving field. Arch Phys Med Rehabil. 2003;84:S43–S51.[CrossRef][ISI][Medline]
  13. Cook KF, O'Malley KJ, Roddey TS. Dynamic assessment of health outcomes: time to let the CAT out of the bag? Health Serv Res. 2005;40:1694–1711.[CrossRef][ISI][Medline]
  14. Fries J, Bruce B, Cella D. The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. Clin Exp Rheumatol. 2005;23:S53–S57.[ISI][Medline]
  15. Hart DL, Mioduski JE, Stratford PW. Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments. J Clin Epidemiol. 2005;58:629–638.[CrossRef][ISI][Medline]
  16. Hart DL, Cook KF, Mioduski JE, et al. Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. J Clin Epidemiol. 2006;59:290–298.[CrossRef][ISI][Medline]
  17. Haley SM, Ni PS, Hambleton RK, et al. Computer adaptive testing improves accuracy and precision of scores over random item selection in a physical functioning item bank. J Clin Epidemiol. 2006;59:1174–1182.[CrossRef][ISI][Medline]
  18. Ware JE Jr, Gandek B, Sinclair SJ, Bjorner JB. Item response theory in computer adaptive testing: implications for outcomes measurement in rehabilitation. Rehabil Psychol. 2005;50:71–78.[CrossRef][ISI]
  19. Dijkers MP. A computer adaptive testing simulation applied to the FIM instrument motor component. Arch Phys Med Rehabil. 2003;84:384–393.[CrossRef][ISI][Medline]
  20. Haley SM, Fragala-Pinkham MA, Ni PS. Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness program. Clin Rehabil. 2006;20:616–622.[Abstract/Free Full Text]
  21. Haley SM, Coster WJ, Andres PL, et al. Activity outcome measurement for post-acute care. Med Care. 2004;42:I-49–I-61.[Medline]
  22. Haley SM, Andres PL, Coster WJ, et al. Short-form activity measure for post-acute care (AM-PAC). Arch Phys Med Rehabil. 2004;85:649–660.[CrossRef][ISI][Medline]
  23. Coster WJ, Haley SM, Andres PL, et al. Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain. Med Care. 2004;42:I62–I72.[Medline]
  24. Haley SM, Coster WJ, Andres PL, et al. Score comparability of short-forms and computerized adaptive testing: simulation study with the Activity Measure for Post-Acute Care (AM-PAC). Arch Phys Med Rehabil. 2004;85:661–666.[CrossRef][ISI][Medline]
  25. International Classification of Functioning, Disability and Handicap (ICF). Geneva, Switzerland: World Health Organization; 2001.
  26. Muraki E. A generalized partial credit model. In: van der Linden W, Hambleton RK, eds. Handbook of Modern Item Response Theory. New York, NY: Springer-Verlag New York Inc; 1997:153–168.
  27. Andres PL, Haley SM, Ni PS. Is patient-reported function reliable for monitoring post-acute outcomes? Am J Phys Med Rehabil. 2003;82:614–621.[CrossRef][ISI][Medline]
  28. Kingsbury G, Zara A. A comparison of procedures for content-sensitive item selection in computerized adaptive testing. Applied Measurement in Education. 1991;4:241–261.[CrossRef]
  29. Revuelta J, Ponsoda V. A comparison of item exposure control methods in computerized adaptive testing. J Educ Meas. 1998;35:311–327.[CrossRef]
  30. Stocking M, Lewis C. Controlling item exposure conditional on ability in computerized adaptive testing. J Educ Behav Stat. 1998;23:57–75.[CrossRef]
  31. Drasgow F, Levine M, Williams E. Appropriateness measurement with polytomous item response models and standardized indices. Br J Math Stat Psychol. 1985;38:67–86.[ISI]
  32. Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Phys Ther. 2006;86:735–743.[Abstract/Free Full Text]
  33. Irrgang JJ, Snyder-Mackler L, Wainner RS, et al. Development of a patient-reported measure of function of the knee. J Bone Joint Surg Am. 1998;80:1132–1145.[Abstract/Free Full Text]
  34. Tegner Y, Lysholm J. Rating systems in the evaluation of knee ligament injuries. Clin Orthop. 1985;190:43–49.
  35. Jette DU, Jette AM. Physical therapy and health outcomes in patients with knee impairments. Phys Ther. 1996;76:1178–1187.[Abstract/Free Full Text]
  36. Jette DU, Jette AM. Physical therapy and health outcomes in patients with spinal impairments. Phys Ther. 1996;76:930–941.[Abstract/Free Full Text]
  37. Wilkerson DL, Johnston MV. Clinical program monitoring systems: current capability and future directions. In: Fuhrer MJ, ed. Assessing Medical Rehabilitation Practices: The Promise of Outcomes Research. Baltimore, Md: Paul H Brookes Publishing Co Inc; 1997:275–306.
  38. Coster WJ, Haley SM, Jette AM. Measuring patient-reported outcomes after discharge from inpatient rehabilitation settings. J Rehabil Med. 2006;38:237–242.[CrossRef][ISI][Medline]
  39. Haley SM, Siebens H, Coster WJ, et al. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation. Arch Phys Med Rehabil. 2006;87:1033–1042.[CrossRef][ISI][Medline]
  40. McHorney C, Tarlov A. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4:293–306.[CrossRef][ISI][Medline]
  41. Farkas M, Jette AM, Tennstedt S, et al. Knowledge dissemination and utilization in gerontology: an organizing framework. Gerontologist. 2003;43:47–56.[Abstract/Free Full Text]



This article has been cited by other articles:


Home page
ptjournalHome page
A. M Jette
Invited Commentary
Physical Therapy, July 1, 2008; 88(7): 851 - 853.
[Full Text] [PDF]


Home page
ptjournalHome page
D. Deutscher, D. L Hart, R. Dickstein, S. D Horn, and M. Gutvirtz
Implementing an Integrated Electronic Outcomes and Electronic Health Record Process to Create a Foundation for Clinical Practice Improvement
Physical Therapy, February 1, 2008; 88(2): 270 - 285.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
R. L Craik
Till We Meet Again
Physical Therapy, July 1, 2007; 87(7): 830 - 832.
[Full Text] [PDF]


Home page
ptjournalHome page
D. L Hart
On "Prospective Evaluation of the AM-PAC-CAT..." Jette et al. Phys Ther. 2007;87:385 398.
Physical Therapy, May 1, 2007; 87(5): 609 - 611.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Correction (v87,p957)
Right arrow Correction (v87,p617)
Right arrow Correction (v87,p617)
Right arrow All Versions of this Article:
ptj.20060121v1
87/4/385    most recent
Right arrow Submit a response
Right arrow Read responses to this article
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions