PTJ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


PHYS THER
Vol. 81, No. 6, June 2001, pp. 1233-1252

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by VanSwearingen, J. M
Right arrow Articles by Brach, J. S
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by VanSwearingen, J. M
Right arrow Articles by Brach, J. S
Related Collections
Right arrow Perspectives
Right arrow Tests and Measurements
Right arrow Geriatrics: Other
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Perspectives

Making Geriatric Assessment Work: Selecting Useful Measures

Jessie M VanSwearingen and Jennifer S Brach

JM VanSwearingen, PT, PhD, is Associate Professor of Physical Therapy, Department of Physical Therapy, School of Health and Rehabilitation Sciences, University of Pittsburgh, 6035 Forbes Tower, Pittsburgh, PA 15260 (USA) (jessievs+{at}pitt.edu).
JS Brach, PT, PhD, GCS, is Clinical Assistant Professor of Physical Therapy, Department of Physical Therapy, School of Health and Rehabilitation Sciences, University of Pittsburgh, and a postdoctoral fellow, Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh

Address all correspondence to Dr VanSwearingen



    Abstract
 
Often the goal of physical therapy is to reduce morbidity and prevent or delay loss of independence. The purpose of this article is to describe issues to consider when selecting measures of physical function for use with community-dwelling adults over the age of 65 years. We chose 16 measures of physical function for review because they have been used in studies of community-dwelling older adults and some psychometric properties of reliability and validity have been described in the literature. Three major issues are discussed: (1) appropriateness of the measure for community-dwelling older adults, (2) practical aspects of test administration, and (3) psychometric properties. These issues are illustrated using examples from the 16 measures. Two scenarios, applying the measures to the assessment of physical performance of community-dwelling well older people and to the assessment of physical performance of community-dwelling frail older people, are used to illustrate how this information can be used.

Key Words: Geriatric assessment • Physical functioning


    Introduction
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
Assessment of a person's physical performance is part of the health care management of older people,1,2 particularly for physical therapists for whom the goal of intervention is often to improve function or reduce morbidity.3 Health care professionals need to recognize who has a problem,4 to determine when interventions are necessary and often what those interventions should be, to select outcome measures, to predict physical function, and to plan for the public health needs of older adults.47 The impact of geriatric assessment and interventions on morbidity,1,2,810 such as improved performance of basic and instrumental activities of daily living (BADL and IADL) and optimal independence, remains to be demonstrated. We contend that the selection of appropriate measures is important for determining the effectiveness of geriatric assessment and interventions in reducing morbidity. Despite the number of older people living in the community rather than in institutions,11 the literature has little information about the physical performance of community-dwelling older adults.

The focus of this article is to provide our views on the appropriateness of measures for use, the practicality of administration, and some of the psychometric properties of some measures of physical function in community-dwelling older adults over the age of 65 years. Our objective is to provide clinicians working with community-dwelling older people with a guide to be used in selecting measures of physical function.

Our intent was to review only measures of physical function at the disability level (ie, as defined by the World Health Organization's International Classification of Impairments, Disabilities, and Handicaps disablement scheme12) and not to examine measures of individual body systems (ie, impairment level). Measures of balance or fitness, although sometimes considered as impairment-level measures, typically, in our opinion, represent assessment of the interaction of multiple body systems (much like gait) in the performance of the tests. Thus, we have included balance and fitness in our review.


    Selecting Measures for Review
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
Measures selected for review had to meet certain basic criteria. They must have been: (1) developed for and tested among community-dwelling older people, (2) shown to be measures that could be applied in almost any clinical setting with minimal equipment, cost, or special requirements, (3) described in peer-reviewed studies of community-dwelling older people, and (4) reported to have some form of reliability and validity.

The measures reviewed are listed in Table 1. We organized the description of the measures into 3 categories: (1) comprehensive physical performance of activities of daily living (ADL), (2) mobility and balance, and (3) fitness for activity. Within each category, performance-based measures are discussed first, followed by a discussion of self-report measures.


View this table:
[in this window]
[in a new window]
Table 1. Measures Reviewed Based on Results of MEDLINE Searcha

 

    Selecting a Measure for Clinical Use
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
Previous investigators studying outcomes of interventions in older adults defined 3 major considerations: (1) appropriateness to the target population, (2) practical aspects of test administration, and (3) psychometric properties.13,14 In this article, we discuss the selection of physical function measures relative to these 3 broad concerns. We recommend that readers should consult a more detailed guide of "things to consider" in selecting measures, the Standards for Tests and Measurements in Physical Therapy Practice,* for additional direction in choosing assessment instruments.


    Selecting a Measure: Appropriateness to the Target Population
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
We believe that measures should be chosen based on whether they were designed and have been used with people similar to the people to be measured.16,17 For example, if a measure could be used to determine whether an older patient in the hospital recovering from pneumonia and nearing discharge has recovered physical function adequate to return home. We believe that a measure of physical function in ADL designed and examined for use with community-dwelling older people should be chosen. We do not believe that the Barthel Index of ADL,18 designed for use with institutionalized older adults, would be appropriate. In our view, the Physical Performance Test (PPT), which was designed and tested on a sample of community-dwelling older people, could enable the user to describe the physical function of the older person.19

Limitations in measurements, such as ceiling or floor effects, can usually be avoided by selecting measures that have been demonstrated to provide meaningful information about people who are similar to those being measured. Ceiling or floor effects, meaning a large number of individuals receive the maximum or minimum score, limit the ability of a test user to show change.16 Measures for which a sizable proportion of those measured perform at the ceiling or floor level typically fail to provide meaningful information.20

For example, in one study of self-reported physical performance of community-dwelling older people using the Functional Status Questionnaire (FSQ), 64% of the people obtained the highest or ceiling level score for the BADL subscale and 23% scored the highest for the IADL subscale.20 The FSQ would then have limited ability to provide much information about improvements in physical function because many older people scored the maximum prior to the intervention. In contrast, for the same group of community-dwelling older people, only 4% of the people scored at that level for the PPT, and 8% performed at that level for the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) physical function subscale. For all of the scales (ie, FSQ-BADL, FSQ-IADL, 7-item PPT, and SF-36 physical function subscale), 0% of the people scored at the lowest or floor level.20 If the purpose of the measure is to describe performance and to be able to monitor change (eg, response to treatment or decline in physical function), the wide range of scores represented in the sample using the PPT and the SF-36 physical function subscale indicate that these 2 measures may be more useful than the FSQ.

Face validity indicates whether a measure appears to have been designed to measure what it is supposed to measure, and, for our purposes, that is physical function of older people. Face validity, while contributing to validity of the data obtained with a measure, is not represented by the outcome of a statistical test but by the judgment of the tester that the measure has been used under similar conditions of measurement.21 In our view, appropriateness for those being measured and face validity appear to be the first considerations for selecting a measure. This, in our view, quickly narrows the possible choices. For example, the Timed "Up & Go" Test (TUG) was designed and tested as a measure of mobility.22 The recommended interpretation of TUG scores is based on distinguishing older people who are mostly independent in mobility and chair-to-stand transfers (TUG score of <20 seconds) from those who need assistance in most mobility and chair-to-stand transfers (TUG score of ≥30 seconds).22 The relationship between timed scores for the TUG and risk of falling for older people has not been determined. Thus, we believe that the TUG could not be expected to provide useful information about the risk of an older person falling (but see VanSwearingen23).


    Selecting a Measure: Practicality
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
Practicality should be considered when choosing a test. Factors to consider are: (1) the time needed to administer the test, (2) the experience needed by the person administering the test (eg, professional or technical), (3) whether administering the test requires prior experience or formal training, (4) the equipment needed, (5) the format of the test (self-report or performance-based), (6) the method of scoring (eg, manually or computer-assisted), and (7) the format of the resulting measurements.16,24,25

Practicality: Time Needed to Administer the Test

In choosing a test, the time to complete the test and the patient's ability to tolerate testing without fatigue should be considered. For example, although patients may appear to have the time to complete self-report measures in the waiting room before being seen for therapy, we believe that lengthy or numerous self-report forms could interfere with patient care. Some older people (as well as some younger people), in our opinion, may fatigue while completing self-report forms, and this fatigue could influence their responses. We contend, therefore, that the SF-3626 may be preferred for use instead of the Sickness Impact Profile (SIP)27 for self-report of physical function in ADL because the SF-36 is brief compared with the SIP, which contains 136 questions in 12 categories.

Practicality: Experience of the Person Administering the Test

The issue of skill may influence the choice of a test. In our view, nonprofessional support personnel, with even minimal training, may obtain measurements using the Functional Reach Test,28 the TUG, gait speed,29,30 chair rise time,31 and balance measures.32 In contrast, some measures of mobility and balance such as the Berg Balance Scale33 and the Modified Gait Abnormality Rating Scale (GARS-M),34 in our opinion, appear to require and have been tested using only the expertise of a professional to make judgments about performance relative to standards for the items of the test. To a lesser degree, we assume that the Performance-Oriented Mobility Assessment (POMA)35 requires professional expertise, although much of the reported use of the POMA does not appear to have involved a physical therapist in the application of the test.3638

Timed tests of physical function often results in scores based on the time needed to complete a task. For example, on the PPT, item scores range from 0 to 4, based on the time for completion of the task, with the exception of one item that involves a criterion-based judgment (ie, turning 360°). We believe that the PPT can be administered by support personnel or by a physical therapist with similar results.

Practicality: Administering the Test Requires Prior Experience or Formal Training

Instructions for administration of a test sometimes are not sufficient, and special training and experience are needed to achieve an acceptable level of accuracy in measurement.21,39 The Berg Balance Scale33 and the GARS-M34 have been described in reports of studies in which the raters were trained. Both tests have instructions for testing individual items; however, initial reliability and validity testing of the measures was based on raters who compared item scoring on trial testing of sample patients, discussing discrepancies in scoring until agreement was reached prior to the actual use of the measure.33,34 Other users could not expect similar accuracy of measurement without similar experience and training and would need to determine the reliability for specific conditions of measurement that differ from those previously described.

The seated step test40 does not involve what we would consider a novel measurement scale or calibration of raters, as we believe the GARS-M and the Berg Balance Scale do, and can be administered, in our opinion, by any personnel trained in recording vital signs. In practice, however, seated step tests40 require the tester to be experienced in manually recording an exercising heart rate and blood pressure. The Six-Minute Walk Test,41 based on distance walked and not a recording of exercising vital signs, might serve, in our opinion, as a more reasonable measure of endurance than the seated step tests, if testing personnel have limited experience with the measurement of vital signs.

Practicality: Equipment Needed

The equipment needed to obtain some measurements of physical function can be costly, as can time for setup and operation, in addition to the space required for equipment. If the equipment is not portable, transport or operation of the equipment in settings where community-dwelling older people are measured (eg, free-standing outpatient clinics, home care) may be difficult or even impossible. We selected the measures discussed in this review because we believe that they can be used to obtain measurements in a variety of settings, with minimal equipment costs and without special setups to obtain the measurements (eg, obstacle course).

Practicality: Format of the Measure (Performance-based or Self-Report)

Guralnik et al21 described the advantages and disadvantages of performance-based and self-reported measures for older people. Performance-based measures typically are used to determine what older people can do in the setting in which the measurements are obtained (often a clinic, which is much different from the home or usual living environment).21 Proponents of self-reported measures have argued that these measures reflect what a person does in a more usual or familiar setting.39 Some self-reported measures have been demonstrated to yield reliable measurements42,43 and are correlated with performance-based measures of physical function.20,44 In 1995, Reuben et al20 reported that the information obtained with performance-based and self-report measures of physical function is complementary. Thus, selecting a performance-based measure of physical function (eg, the PPT) and a self-report measure of physical function (eg, the SF-36 physical function and physical role function subscales) may maximize the description of physical function of the older patient.44,45 For example, the performance-based measure indicates an older person's ability to perform functional tasks, whereas the self-report measure indicates a person's usual performance or perception (opinion) of his or her ability to perform functional tasks.39 Self-report measures may be desirable when there are no professional or support staff to obtain performance-based measurements.

Practicality: Method of Scoring

Scores may be obtained while subjects are tested (eg, Functional Reach Test, distance reached in inches), derived from scores assigned to ratings of individual items (eg, PPT, FSQ, GARS-M), or scored through a coding system based on rules for combining various item scores for different subscale measures (eg, SF-36). The SF-36 and the SIP have complex processes (eg, differential weighting of items scores, combining items from multiple sections of the instruments) for calculating scores for each subscale of the instruments. Computer programs can be used for scoring the SF-36 and SIP, reducing the burden on the test administrator, but this delays scoring and interpretation of findings. We believe there is an argument for a clinician interested in immediate feedback and interpretation using the FSQ, a self-report of physical function during ADL. Items have individual point values assigned relative to the items, and subscale scores are transformed from the total for the questions answered to a 100-point basis with basic mathematical processes of addition, subtraction, division, and multiplication.46 Although composite scores transformed to a 100-point basis result in a score for which the relative value is easily recognized (eg, 50/100 suggests that performance is 50% of maximum or of some criterion of performance), it is also true that transformed scores obscure clinical meaning and magnify change in the measurement, which is potentially a problem in statistical analyses.24

Practicality: Format of Results or Reported Scores

All of the instruments reviewed in this article result in a single composite score or in a composite score and subscale scores for components of the item being measured. A single composite score can be desirable for communicating findings to others24 and for identifying older people who are at risk for difficulty in physical function. However, a single composite score may not represent an older person's physical function for a specific task or for all categories of performance.24 Subscale scores for components of physical function, in our opinion, may be more useful for planning intervention and monitoring outcomes. Gait speed may be a sensitive measure of mobility and fall risk3 and responsive to intervention,47 but this measure may not provide the clinician with insight into problems in gait.


    Selecting a Measure: Psychometric Properties
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
According to Kirshner and Guyatt,48 clinical tests and measures have 3 uses: (1) to discriminate between individuals, (2) to predict a result or expected outcome, or (3) to evaluate change with time. The patient management model used in the Guide to Physical Therapist Practice49 describes the following uses of clinical measures: (1) examination, evaluation, and diagnosis (discrimination), (2) prognosis (predict a result), or (3) intervention (evaluate change).

Reliability

Reliability indicates the degree to which 2 measures are alike.15,50 Intraclass correlation coefficients (ICCs)—for quantitative, continuous data—and the Kappa statistic—for determining agreement of categorical data—are often used to represent reliability for measures used in examination, but other statistical tests also are used. Unlike the Kappa statistic, the percentage of agreement, when used to reflect reliability, can be used to estimate closeness of sets of measures, but it does not correct for chance agreement and may not represent reliability relative to any clinical group of patients. Statistical indexes of association (eg, Pearson product moment correlation coefficient [Pearson r], Spearman rho) indicate how 2 measures vary together, but they do not indicate the agreement between the measures. Thus, the Pearson correlation coefficient is rarely appropriate for representing reliability, whereas the ICC, indicating the degree of common variance between measures for continuous data, is a better representative of agreement.50 The standard error of the measurement (SEM) is the most desirable statistic for estimating reliability of measurements obtained with an instrument.50 However, the SEM requires a large sample size (eg, 300–400 measurements) to be accurate.51

For the purposes of prognosis or prediction, reliability estimates determined using measures of agreement48 seem to us appropriate. For example, values of the time to rise from and return to sit in a chair (ie, chair rise time) have been determined for quartiles,34 with the expectation that the quartiles indicate a ranking of risk for disability (eg, older people in the second quartile at greater risk for disability than those in the third quartile). Using the chair rise time to classify older people and to make prognoses of the risk for functional decline necessitates, in our view, a reliability calculation using an ICC or Kappa statistic of agreement.

Validity

Measures used for evaluation and for determining the outcome of intervention, in our opinion, should relate to other measures (ie, have criterion-based validity) based on whether the other measures relate to the theoretical constructs underlying the measure (eg, content and construct validity).52,53 The PPT measure of physical function during ADL has been demonstrated to have some validity for some inferences for some subjects by comparison with previous performance-based measures of function (eg, walking speed, POMA) and self-report (eg, Rosow-Breslau Scale, SF-36, FSQ) at one point in time.19 Chair rise time also has been associated with physical function and thus we contend would be valid for evaluating an outcome of physical function,31,54 but not for predicting risk of falling.

When used to make a prognosis (eg, to predict a future event), the purpose of balance measures is often to recognize older people who are likely to fall.55 The criterion for these balance measures then should be the ability of the measure to predict falls, not the ability to detect or predict an intermediate outcome such as improved balance or reduced postural sway. This logic was used by reviewers who identified studies included in a systematic analysis of "best evidence" for interventions for reducing falls among older people.55 Of the measures of community-dwelling older people reviewed, the Berg Balance Scale, PPT, GARS-M, gait speed, and Functional Reach Test meet, to some extent, tests for criterion validity for balance measures (eg, all have been studied for the association with an outcome of falls for older people).3,5658 The TUG and POMA, although associated with other measures of fall risk, have not been independently demonstrated to be related to the outcome of falls for older people.

Sensitivity and specificity are important characteristics of measures used for screening because, when these indexes are used in this context, the intent is to identify individuals whose performance places them at risk for a designated problem.4,59 Effective practice often depends on correctly identifying the person who will develop problems and tailoring interventions to prevent the occurrence of the problems.6 A cutoff score for any measure is the value at which the optimal combination of sensitivity and specificity can be obtained. Cutoff scores can be determined from the receiver operating characteristic (ROC) curve, a plot of the sensitivity versus 1–specificity.60 Optimal cutoff values for a specific purpose (eg, predicting fall risk, future disability) are determined from observing the ROC curve for the point that provides the best combination of sensitivity and specificity, the point closest to the upper left-hand corner of the curve.60 For example, on the ROC curve for identifying the risk of recurrent falls from the gait speed of an older person, the optimal cutoff value for identifying fall risk is 0.56 m/s (Figure). Subsequently, likelihood ratios (sensitivity/1–specificity or true positive/false positive) for the measures are determined to indicate the odds of correctly identifying the older person with or without an increased risk for a specific physical function deficit (eg, falling), given a certain value of the measure.4


Figure 1
View larger version (19K):
[in this window]
[in a new window]
Figure. Receiver operating characteristic (ROC) curve for use of walking speed for recognizing risk of recurrent falls among community-dwelling older people with frailty.

 
The meaning of a cutoff score, particularly changing a cutoff score, becomes apparent from the ROC curve, illustrating the trade-off between sensitivity and specificity.17 For example a cutoff score of 0.50 m/s for walking speed would be associated with a 4 to 1 likelihood of identifying a person who is at risk for falling from a person who is not at risk for falling. Selecting a walking speed of 0.62 m/s as the cutoff score for fall risk reduces the likelihood to 2 to 1 of identifying a person who is at risk for falling from a person who is not at risk for falling (Figure). Cutoff scores determined in a manner other than from the ROC curve do not provide the clinician with the same confidence in the prognosis or prediction.

Measures with known predictive validity allow clinicians to make statements about expected outcomes or performance (prognosis), and this can be helpful in treatment and discharge planning. The physical therapist knowing the predictive validity of measures of physical function for independence in ADL61 will be able to plan for appropriate community services and living arrangements. The performance-based PPT has some predictive validity for identifying who will be residing in a nursing home or dead 18 months later,61 the self-report FSQ has predictive validity for death 51 months later,62 and the short performance battery of lower-extremity function has predictive validity for mobility and ADL disability 4 years later among community-dwelling older people.63 Predictive validity of a measure can also provide a meaningful standard of expected performance, which can be used to determine whether interventions change the expected outcome.64

The responsiveness of a measure is its ability to detect clinically meaningful change over time.65 Responsiveness of a measure is a critical issue for determining the results of interventions.52,53 Effect size (ES) (standard deviation of baseline values/mean of the baseline values) and standard response mean (SRM) (standard deviation of the change pre-intervention to post-intervention/mean of the change pre-intervention to post-intervention) represent the responsiveness of a measure and can be used to compare responsiveness with other outcome measures.65 For example, in a study of the responsiveness of the SF-36, the ES and SRM of the SF-36 subscales were small, suggesting that the scale scores did not change dramatically with the intervention for the older people.66 Performance-based measures have been shown to be more responsive to change, detecting a change (eg, preclinical) in physical function before the change becomes measurable by self-reported BADL and IADL scales.45 Changes in self-reported measures may parallel but not match the magnitude of changes in performance-based measures of impairment and disability.21

The smaller the change detected by a measure (better responsiveness), the greater confidence the clinician has that modest changes in the measurements represent real change. Responsiveness determined by comparing data obtained with a measure that is known to be responsive or by detecting changes after an intervention that is known to be efficacious assures clinicians that the measure is detecting "meaningful" change.65


    Review of Selected Measures
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
Comprehensive Physical Performance—Performance-based Measures

Physical Performance Test.19
The PPT, a measure of usual daily activities, including both BADL and IADL, is a performance-based global measure of physical performance. Developed and tested in frail and well community-dwelling and institutionalized older adults, the PPT has been used to describe and monitor physical performance,19 to screen for falls,3 and to predict the need for institutionalization and the likelihood of death.61 There are 2 versions of the PPT: a 7-item version and a 9-item version. The 7-item PPT consists of the following items: writing a sentence, simulated eating, donning and doffing a jacket, turning 360 degrees while standing, lifting a book, picking up a penny from the floor, and walking 15.2 m (50 ft). For the 9-item PPT, 2 stair-climbing tasks—time to climb a flight of stairs and number of flights climbed (maximum=4)—are added to the items of the 7-item test.19

The PPT takes about 10 minutes to administer and requires only a few simple props, making the measure practical in most clinical settings. Scoring for the majority of the items of the PPT is based on the time it takes to complete the item, reducing the potential for rater bias. Scores on the 7-item and 9-item PPTs range from 0 to 28 and 0 to 36, respectively, with higher scores representing better performance. The PPT involves performance of usual daily tasks; thus, instructions are minimal. In our experience, some older people who do not respond to the verbal request to perform a task when being tested recognize the task when it is demonstrated or presented with the prop (eg, a bowl and spoon) and perform the task appropriately. The PPT was designed for and tested on community-dwelling older adults, and the total score can be compared with percentile rankings for community-dwelling older adults. This should allow clinicians to place an individual's performance in the context of the performance of a population of people over 65 years of age and living in the community.

Interrater reliability for both the 7-item PPT and the 9-item PPT has been determined in a sample of 6 individuals from a geriatric medicine group practice.19 Given the small sample size and the use of the Pearson correlation coefficient, however, we believe that we really cannot state what the reliability is for the measure.

Concurrent validity (Pearson r=.50–.80) was established by comparing the PPT with accepted functional status assessments (ie, Rosow-Breslau Scale, IADL, ADL, POMA) in a sample of 183 patients from geriatric outpatient clinics, senior housing units, board-and-care facilities, and a clinic specializing in the treatment of people with Parkinson disease.19 The predictive validity for institutionalization and mortality has also been demonstrated in a study of 149 individuals from 3 different settings (a senior citizen housing unit, an ambulatory-based geriatrics practice, and a board-and-care facility).61 The PPT has also been used to identify individuals who are at risk for recurrent falls (n=84) (sensitivity=78%, specificity=71%, cutoff score=15).3

Comprehensive Physical Performance—Self-Report Measures

Functional Status Questionnaire.46
The FSQ is a self-report measure of physical, psychological, and social role functions in patients who are ambulatory.46 The FSQ has been widely used to screen and monitor functional status (Tab. 1).46,62

The FSQ can be quickly administered, taking approximately 15 minutes to complete. The 6 subscales can be used individually or as a composite.46 An important aspect of the FSQ is the ease of scoring and what we believe are readily understood subscales and summary scores, which are transformed to a 100-point scale (lower scores represent greater limitations). In addition, if a question is left unanswered, the instrument can still be scored. "Warning zones" (ie, score ranges indicating functional disabilities)46 have been developed for each of the 6 subscales to identify individuals with potential problems.46,62

The FSQ has been shown to yield internally consistent measurements for the various subscales ({alpha}=.64–.82).46 The ADL and IADL subscales have demonstrated high internal consistency ({alpha}=.79 and .82, respectively).46 The social activity subscale also has internal consistency ({alpha}=.65).46 High internal consistency indicates that the items of the subscale reflect a single concept or phenomenon.50 The FSQ has been shown to exhibit construct and convergent validity based on correlations with 7 health measures such as reported bed disability days and restricted activity days for older people (mean age=76 years, age range=64–92 years; N=83).20 Scores in the warning zone for the social activities and IADL subscales were independently predictive of mortality.62

Medical Outcomes Study 36-Item Short-Form Health Survey.26
The SF-36 is a general measure of health status that was designed for use in clinical practice and research, for health policy evaluations, and for general population surveys. The SF-36 is a self-report measure, with 8 subscales of health: limitations in physical function, physical role, social function, emotional role, bodily pain, mental health, vitality, and general health perceptions.

The SF-36, a relatively short questionnaire (10 minutes to complete), can be self-administered or administered by a trained interviewer in person or over the phone. In older adults (aged 75 years and older) with poor physical or mental health, some assistance may be needed to complete the questionnaire.67 Scoring can be complex and may require a computer to aid in the process. Scores on each of the subscales range from 0 to 100, with a score of 0 representing worst health and a score of 100 representing best health.

Psychometric properties have been established in both a general medical outpatient population and a frail older adult population. Internal consistency was established with Cronbach alpha coefficients of >.85 and reliability coefficients of >.75.68 Test-retest reliability (ICC= .65–.87) was determined by administering the SF-36 twice, 1 month apart, in a sample of 186 older adults (65 years of age and older).69 In a sample of frail older people (over the age of 65 years of age) with one or more limitations of ADL admitted to a restorative care or day-hospital care facility for older adults, Cronbach alpha coefficients for internal consistency (a measure of the reliability of tests items for representing a single construct) were within the range of .72 to .91 for the 8 subscales (N=131), and ICCs for test-retest reliability ranged from .24 to .80, with the ICCs between .61 and .80 (n=41) for all except one of the test subscales.66

Because groups with expected health differences could be differentiated, measurements obtained with the SF-36 have some validity.68 In people who are considered to be at risk for acute deterioration of their health because of age (over 75 years of age) or because of debilitating medical diagnoses (in people 50 to 74 years of age), the bodily pain, social function, physical role, and emotional role subscales may demonstrate a problematic ceiling or floor effect.70

The ability of the SF-36 to detect changes over a 12-month period (responsiveness) was examined in a sample of 131 frail older adults (65 years of age and older). Changes in mean scores over time were found for all scales of the SF-36 except the general health and emotional role subscales. Compared with other measures (eg, Barthel Index, Older Americans Research and Service Center Instrument [OARS]-IADL scale, Spitzer Quality of Life Index), all 8 subscales of the SF-36 were less responsive.66 The SF-36 was originally developed to assess health-related quality of life in a more general population (mean age=54 years, 71% less than 65 years of age; uncomplicated medical diagnoses; N=2,293), so the lack of responsiveness in a frail older population is not surprising.

Rosow-Breslau Scale.71
The Rosow-Breslau Scale is a questionnaire developed in the 1960s to evaluate the relative difficulty of performing tasks of daily living for people with a "high-level" of physical function.19,71 Although the items of the Rosow-Breslau Scale are physically challenging, they are typical of tasks faced by community-dwelling older adults (eg, walk a half mile, walk up to the second floor and down, perform heavy housework). The instrument, in our view, may not be good for measuring the range of performance exhibited by many community-dwelling older adults who are ambulatory and who demonstrate greater disability.

Test-retest reliability for the Rosow-Breslau Scale was examined in a sample of 177 older adults with a mean age of 76.9 years. The Rosow-Breslau Scale was administered twice, on average, 21 days apart. The test-retest reliability was assessed using the Pearson correlation coefficient (Pearson r=.81). Because the Pearson correlation coefficient is a measure of association and rarely appropriate for representing reliability, we cannot be sure of how much error (lack of reliability) is associated with this measure. This correlation also may have been inflated because many of the individuals did not report any functional limitations (n=69). After excluding the individuals with no functional limitations, the Pearson correlation coefficient decreased slightly to .60.72 Concurrent validity of measurements obtained with the Rosow-Breslau Scale was established by comparison with other measures of physical performance.19

Sickness Impact Profile.73
The SIP is a multidimensional measure of health status and the impact of sickness. The SIP has been used extensively in both younger and older individuals to describe and monitor health status.7478 The scale consists of 136 items in 12 different categories: sleep and rest, emotional behavior, mobility, body care and movement, eating, ambulation, recreation and pastimes, social behavior, communication, alertness behavior, home management, and work.

The SIP, a relatively lengthy questionnaire, can be either self-administered or interviewer administered. However, Bergner et al79 found that, when the SIP was administered through the mail, slightly different information was obtained than when the questionnaire was administered in person. The SIP has been used for both younger people (ages 3–14 years,74 18-30 years,75 and 17–65 years76) and older people (ages 66–93 years,76 ≥80 years,77 and ≥65 years78), and respondent age does not appear to be a factor in the use of the scale.80 Scores on the SIP are expressed as a percentage from 0% to 100%, with higher scores representing greater dysfunction. Overall, category, and dimension scores can be calculated.79

Some psychometric properties have been established for the SIP. Initially, reliability was determined using different interviewers, different forms, different administration procedures, and subjects with varying levels of dysfunction. In this study,73 internal consistency ({alpha}=.94) was also determined. Test-retest reliability was determined by administering the SIP twice, 24 hours apart, in a sample of 119 subjects (age range=25–54 years).73 The Pearson product moment correlation was used to examine test-retest reliability for the overall SIP score.73 As stated previously, however, the Pearson correlation coefficient is a measure of association and is rarely appropriate for representing reliability; an ICC to determine the common variance between measures is a better statistical choice.

The validity of measurements obtained with the SIP was established by comparison with self-assessment of dysfunction (Pearson r=.69) and self-assessment of sickness (Pearson r=.63) and with clinician assessment of dysfunction (Pearson r=.50). The SIP was also correlated to the National Health Interview Survey (Pearson r=.55), and the instrument could discriminate among subsamples of patients.27 In a sample of older adults without impairments (mean age=72.5 years), scores on the SIP were strongly skewed toward low (good health) scores, with 27% of the subjects having a score of 0%.81 Responsiveness of the SIP was determined in 7 longitudinal projects that differed by the diagnoses of the patients studied in each project. Changes in health indicated by clinical judgment and other health status indexes were associated with changes in SIP scores. Thus, the changes in functional status were identified using the SIP, and the changes appeared to be valid representations of changes in health-related functional status.82

Mobility and Balance—Performance-based Measures

Berg Balance Scale.33
The Berg Balance Scale is a performance-based measure designed to monitor performance during balance activities,33 to screen for individuals who therapists perceive would benefit from a physical therapy referral,5 and to predict multiple falls in community-dwelling and institutionalized older adults.83

The Berg Balance Scale consists of 14 common tasks, requires only a few props, and takes approximately 15 to 20 minutes to administer. The 14 items are scored on a 5-point ordinal scale (0=unable to perform, 4=independent) based on ability to complete the task and time for completion. The scores on the 14 items are combined for a total score, which can range from 0 to 56, with a higher score relating to better performance.

The Berg Balance Scale has some established psychometric properties. The Cronbach alpha value for internal consistency for the entire scale for a sample of older adults (N=38) was .96.33 Interrater reliability was determined by having one physical therapist administer the test and rate 14 older people and then having 5 physical therapists rate the same 14 patients from videotaped recordings of the evaluation (ICC=.98)33; for individual items, reliability ranged from .71 to .99. Four of the physical therapists rated the same videotapes of the balance testing 1 week later and determined intrarater reliability for the total score to be ICC=.99; for individual items, the ICCs ranged from .71 to .99.33 Interrater reliability determined with multiple raters using videotapes of the administration of the test indicates the reliability of scoring the test and not the reliability of test administration by multiple raters.

Concurrent validity for the Berg Balance Scale as a measure of balance and mobility was determined by comparison with tests of postural sway (Pearson r=–.55), the POMA balance subscale (Pearson r=.91), and the TUG (Pearson r=–.76).83 The Berg Balance Scale has been used to identify individuals who would "benefit" from a referral for physical therapy as determined by the recommendation of physical therapists based on a screening physical examination (sensitivity=84% and specificity=78% using a cutoff score for referral of 48).5 In addition, a score of less than 45 was shown to be predictive of risk for recurrent falls by a meta-analysis (N=110 older people)55 and predictive of a future fall in 113 older people.57

Functional Reach Test.28
The Functional Reach Test was developed as a measure of the margin of stability. The Functional Reach Test has been used to describe and monitor an individual's balance28 and to screen for or predict an individual who is at risk for falling.58

The Functional Reach Test is an easily administered performance-based measure. The test consists of measuring the distance that an individual can reach forward without moving his or her feet. The score is the distance (in inches) that the person can reach. Variations of the test, having individuals reach in different directions, have been suggested. Shumway-Cook and Woollacott84 referred to age-related normative values for functional reach for men and women in 2 age groups: people aged 41 to 69 years and people aged 70 to 87 years. However, we argue that the values should not be considered age-related norms, given the design of the original research from which the values were taken. The values of functional reach by age groups in the original study28 were based on small numbers (age 41–69 years: 22 men, 28 women; age 70–87 years: 20 men, 14 women). We believe the sample was not representative of the general older adult community-dwelling population (eg, volunteers, excluded left-hand–dominant people, smaller number of female subjects although the number of women substantially exceeds the number of men among the oldest old), and there was no indication of the range of ages within the groups.28

Test-retest reliability (ICC=.81) and interrater reliability were demonstrated in a sample of 128 volunteers whose ages ranged from 21 to 87 years.28 Concurrent validity for balance and physical function was established by comparison with center-of-pressure excursion (Pearson r=.71) and various measures of physical performance (Spearman rho, r=.64–.71) in a sample of 45 community-dwelling older adults aged 66 to 104 years.85

Predictive validity for falls has also been determined in a prospective study of a sample of 217 male veterans.58 Compared with people who reached 25.4 cm or more, people with a reach of greater than 15.2 cm but less than 25.4 cm were twice as likely to fall, people with a reach of 15.2 cm or less were 4 times more likely to fall, and people who could not reach were 8 times more likely to fall (fall likelihood referring to people who fell 2 or more times in 6 months). The odds ratios for the category of reach give the strength of association between a category and recurrent falls, but they indicate little about the meaning of a specific value relative to the risk for recurrent falls, as all categories are relative to the group who could reach 25.4 cm or more. In addition, of the individuals who were unable to perform the reaching task (score=0, n=24), only 8 (33.3%) reported 2 or more falls. Therefore, although individuals who were unable to reach were 8 times more likely than the people who could reach 25.4 cm or more, only 33% of the people who were unable to reach fell 2 or more times in 6 months. A cutoff score for risk of recurrent falls determined from a ROC curve plotted from the sensitivity and specificity of reach values for recognizing people with a history of recurrent falls should indicate the predictive ability of the measure, but this has not been reported. We question whether the Functional Reach Test is a measure of balance, because a recent study measuring reach in elderly people without impairments and individuals with vestibular hypofunction, who were expected to have poor balance, showed no difference in functional reach distance between the 2 groups.86

Gait speed.29
Gait speed is an extremely common measure used to describe and monitor mobility29,87 and to screen for falls3 in older adults. Gait speed is easy to measure by timing an individual while he or she walks at a habitual pace over a known distance. The distances used to calculate gait speed have ranged from 6 m88,89 to 20 m.87,90,91

Test-retest reliability (Pearson r=.93, ICC=.78), with 48 hours between measurements, was established in a sample of 199 adults who were over 55 years of age (60% of the sample was over 70 years of age).42 The sample represents a spectrum of ages of older adults living in the community, not a subgroup of only young old adults. In people who were more frail (N=105), test-retest reliability for measurements obtained approximately 2 weeks apart was ICC=.79.92 Gait speed has been shown to be a valid measure of the ability to walk in older adults (over 60 years of age) by comparison with stride length (Pearson r=.93, n=5129; r=.84, n=27 and r=.88, n=22 for those with and without a history of falls, respectively90), cadence (Pearson r=.74, n=5129), and double support time (Pearson r=.86, n=4929) and by comparison with a measure of gait abnormalities associated with falling (Spearman rho, r=–.68, n=8434; r=.82, n=27 and r=.79, n=22 for those at risk and not at risk for recurrent falls, respectively90).

Sensitivity (72%) and specificity (74%) of gait speed for recognizing the risk of recurrent falls have been determined by physical therapists recording gait speed in frail older adults, including a cutoff score of 0.56 m/s for risk of recurrent falls.3 Sensitivity (80%) and specificity (89%) using a cutoff score of 0.57 m/s for identifying individuals who would benefit from physical therapy evaluation and possible treatment (as determined by comparison with recommendations of physical therapists conducting screening physical examinations) have also been established.5

Modified Gait Abnormality Rating Scale.34
The original Gait Abnormality Rating Scale (GARS)90 from which the GARS-M was derived was designed to measure abnormalities of gait for older adults who are at risk for falling. The GARS has been used to describe gait and to distinguish people who are likely to fall from other nursing home residents,90 whereas the GARS-M has been tested most extensively on community-dwelling, frail older adults.3,34

The GARS-M is a 7-item measure designed to identify abnormalities of gait for older adults who are at risk for falling. The GARS-M is administered by videotaping the individuals as they walk on level surfaces. The 7 items of the GARS-M—gait variability, guardedness, staggering, foot contact, hip range of motion, shoulder extension, and arm–heel-strike synchrony—are scored from the videotape of the individual walking. The 7 items of the GARS-M are scored on a 4 point criterion scale (0–3), with higher scores representing poorer performance. The total score of the GARS-M ranges from 0 to 21, with higher scores indicating greater abnormality and risk for falling. One advantage of the GARS-M is that anyone can videotape the subject walking and the professional can presumably score the GARS-M from the videotape at a more convenient time.

A range of measurement characteristics has been established on the GARS-M. The GARS-M has demonstrated interrater reliability (Kappa coefficient [{kappa}]=.97) and intrarater reliability ({kappa}=.97).34 The Kappa coefficient statistic was used to indicate agreement of raters because the GARS-M is a categorical measure. Item scores represent a description of gait and are ranked by presumed difficulty (0–3), but the distance between scores is not equivalent.93 Concurrent validity of data obtained with the GARS-M was determined by comparison with temporal and spatial gait characteristics. Construct validity of data obtained with the GARS-M was determined by the ability of the measure to distinguish older adults with a history of recurrent falls from older adults without a history of recurrent falls.34 Sensitivity (62.3%) and specificity (87.1%) for risk of recurrent falls have been determined, with a cutoff score of 9 for identifying individuals who are at risk for recurrent falls.3

Performance-Oriented Mobility Assessment.35
The POMA is a widely used clinical measure of gait and balance in community-dwelling older adults.5,35,38,94,95 The POMA has been used to describe and monitor balance and gait and to identify individuals who are at risk for falling. The original version of the POMA consisted of the direct observation of the performance of 13 balance skills (eg, sit-to-stand transfers, standing balance, turning 360°) and 9 gait skills (eg, initiation of gait, step length and height, path deviation) and rating the skills as "normal," "adaptive," or "abnormal."35 In subsequent versions, the individual items have been scored using a criterion-based scale, with scores ranging from 0 to 1 or from 0 to 2. The item scores are combined for a balance scale score, a gait scale score, and a total score. Since the original version,35 several different versions of the balance and gait scales have been described, with variability in the items included and scoring of the subscales. We have found 5 different versions of the POMA (Tab. 2). The items included in the gait and balance subscales and the scoring of the subscales differed. For example, balance subscale scores excluded some items, used a descriptive score (eg, "normal," "adaptive," "abnormal"), or used a numerical score (eg, 0–1, 0–2), and total scores differed (11, 15, and 16) for the 5 versions. All of the investigators who wrote about these versions referred to the original version of the POMA35 as the source for scoring and for psychometric properties. Only Robbins et al94 reported using a modified version of the POMA, but these authors did not indicate how the scale was modified or what items were included in the modified version.


View this table:
[in this window]
[in a new window]
Table 2. Variation in Items and Scoring of the Performance-Oriented Mobility Assessment (POMA)

 
With many different versions of the scale in use but not necessarily clearly identified, describing the measurement characteristics of any version of the POMA is difficult. In the most commonly cited reference for reliability of data obtained with the measure, 85% agreement was reported between 2 raters for the individual items for 15 subjects and less than 10% disagreement was reported for total scores between raters.35 Percentage of agreement, however, can be misleading in that the percentage indicates the frequency of agreement but does not indicate the degree of difference in scoring for the items in which the raters disagreed. There is also no adjustment for chance agreement, and there are no probability estimates.

Some investigators have provided some validity testing for the POMA in comparative studies of physical performance-based measures.19 The balance section score in combination with hip weakness, postural hypotension, and cognitive impairment has been shown to predict the possibility of falls.94 In addition, sensitivity (68%) and specificity (78%), using a cutoff score of 14 for identifying individuals who would benefit from a physical therapy evaluation and possible treatment, have been established for a balance subscale of the POMA.5

Balance measures (parallel, semi-tandem, and tandem stand).32
The ability to stand with feet parallel, semi-tandem, and tandem is a commonly used measure of balance and was most extensively cited in the Frailty and Injuries Cooperative Studies of Intervention Techniques (FICSIT) trials.32 Various versions or combinations of the test exist, but the basic idea is the same. The individual is timed while attempting to maintain balance with the feet in a variety of positions.

The FICSIT static balance measure (FICSIT-3) is used to test a person's ability to maintain balance in parallel, semi-tandem, and tandem stances.32 In addition to the 3 positions of the FICSIT-3 measure, there is the FICSIT-4, in which one-legged stance was used in addition to the FICSIT-3 measure.96 The FICSIT-3 balance measure was utilized at all 8 sites of the FICSIT study, which included both community-dwelling people (n=2,265) and nursing home residents (n=294). The FICSIT-4 measure was used only at the 3 sites that included the least frail community-dwelling people (n=491).96

Rossiter-Fornoff et al96 determined test-retest reliability for the FICSIT-3 measure by administering the measure twice, with intervals between measurements ranging from 3 to 12 months. Pearson product moment correlation coefficients were used to estimate reliability, with values ranging from .25 to .74; the lowest correlations (.25 and .38) were associated with the sites with the greatest time between tests. The Pearson correlation coefficient is a measure of association and is not representative of reliability. Concurrent criterion-related validity for the FICSIT-3 balance measure was determined by comparison with various mobility-related measures such as gait speed and the ambulation section of the SIP.32 Validity was further established by comparing the scores on the FICSIT-3 balance measure between 2 distinct groups (ie, community-dwelling individuals [N=2,265] and nursing home residents [N=294]). The residents of the nursing home were substantially worse at maintaining their balance (percentage scoring the maximum: parallel stance=32%, semi-tandem stance=36%, tandem stance=5%) than the community-dwelling individuals (percentage scoring the maximum: parallel stance=93%, semi-tandem stance=89%, tandem stance=61%).96 Rossiter-Fornoff et al,96 studying exercise intervention for community-dwelling older people, found that one problem with the use of the FICSIT-3 is a substantial ceiling effect in a community-dwelling sample. With the addition of the one-legged stance task to the FICSIT-3, the FICSIT-4 appears to discriminate balance over a wider range of health status in community-dwelling older adults (percentage scoring the maximum: one-legged stance=50% compared with tandem stance=100%, N=296).96

Timed chair rise.96
The timed chair rise has been used by several investigators to examine functional status, lower-extremity muscle force, and balance in older adults. The timed chair rise has been commonly used in large-scale epidemiological studies of older adults.36,63,92

The test consists of timing an individual rising from a standard chair without the use of the arms for support on the chair. The grade is "able" or "unable." If the grade is "able," then the seconds it took to complete the test is recorded. If the grade is "unable," the individual is tested with the use of the arms for support on the chair. Time to complete 5 chair stands has also been used to reflect lower-extremity muscle force, balance, and functional mobility.36,54,63,92

The timed chair rise has yielded data with test-retest reliability (ICC=.84–.92), when the tests were done 2 to 5 days apart, in a sample of 76 older adults (mean age=70.5 years, SD=5.5).55 Test-retest reliability in a frail, older adult population was greater for repetitive chair rise (ICC=.67) than for the single chair stand (ICC=.25).92 Interobserver reliability (Pearson r=0.93) has been reported for two testers on the same day.97 Concurrent validity of data obtained with the timed chair rise test (Pearson r=.71–.78) was established by comparison with measurements of lower-body strength in 76 older adults. The timed chair rise test can detect differences among age groups (ages 60–69 years, n=32; 70–79 years, n=96; 80–89 years, n=62; F=4.4; df=2,187; P<.01) and activity levels (high activity, n=144; low activity, n=46; F=21.9; df=1,188; P<.0001), thus suggesting construct validity.54

Timed "Up & Go" Test.22
The TUG is a widely used performance-based measure of functional mobility in community-dwelling older adults (Tab. 1). The TUG has been used to describe and monitor functional mobility. Quick and easy to administer, the TUG consists of timing an individual as he or she stands, walks 3 m, turns 180 degrees, and returns to the chair and sits down. The score on the test is the time it takes (in seconds) to complete the task.22

Reliability and validity have been described for the TUG. Interrater reliability (ICC=.99) and intrarater reliability (ICC=.99) were established for a sample of 60 patients (mean age=79.5 years) referred to a geriatric day hospital.22 Concurrent validity was demonstrated in the same sample by comparison with the Berg Balance Scale (Pearson r=–.81), gait speed (Pearson r=–.61), and the Barthel Index (Pearson r=–.51).22 Scores have been reported for the TUG, distinguishing between older adults who are mostly independent (<20 seconds) and those who need some help in everyday activities (>30 seconds). However, an intermediate range of scores referred to as being in a gray zone (20–29 seconds) represents older adults with varying levels of independence in mobility.22

A recent report attempted to indicate that there is some predictive validity of TUG scores for falls in community-dwelling older people.98 However, the study was not conducted prospectively, so we question whether prediction of falls can be based on this measure. In addition, 2 groups of older adults at diverse ends of the spectrum of older people who are at risk for falling (ie, older people with no history of falling versus older people who use assistive devices and have a history of multiple falls) were studied,98 not a representative sample for determining sensitivity and specificity of the TUG for identifying fall risk.

Short physical performance battery for lower-extremity function.36,63
Guralnik et al36 have combined measures of gait speed, balance, and timed chair rise to develop a short physical performance battery for lower-extremity function. The battery was used to assess lower-extremity function in the Established Populations for Epidemiologic Studies of the Elderly (EPESE) in individuals 65 years of age and older.

The battery takes only 10 to 15 minutes to administer. Scores on each of the tasks range from 1 to 4 and are based on the time needed to complete each task. Scores were established using the percentiles of time needed to complete the task in the original study population (1≤25th percentile, 2=26th–50th percentiles, 3=51st–75th percentiles, and 4≥76th percentile).36 Individuals completing a task in little time, indicating better performance, are given higher scores. Individuals who are unable to complete a task are given a score of 0. A summary performance score for the battery is calculated by summing the scores for the 3 tasks (battery score range=0–12).

Reliability was not established for these measures; however, the investigators63 argued that interrater reliability has already been established for the individual components of the battery. Derived measurements often do not necessarily have the same reliability as the measurements used in the calculation.39,50 The short physical performance battery for lower-extremity function has some predictive validity for nursing home admission and death in community-dwelling older adults.36 In older adults who were determined to be nondisabled by self-report initially, the short performance battery for lower-extremity function was predictive of incident mobility and ADL disability 4 years later.63 However, there is some question whether the individuals studied were disabled initially based on their performance on the gait speed measure.99 Initially, 75% of the subjects walked at a gait speed of less than 0.77 m/s, which is considerably less than the usual gait speed of adults without impairments (1.2–1.3 m/s).89,91,100

Fitness for Activity—Performance-based Measures

Seated Step Test (SST).40
The SST is a performance-based measure of exercise tolerance, fitness for activity, and endurance.40 The SST was designed to provide a graded exercise test that is less intense than submaximal and maximal treadmill and bicycle ergometer tests of exercise tolerance. To describe the fitness for ADL of frail older people (eg, having difficulty in performance of more than one ADL task), the SST was modified to a single-stage test, the modified seated step test (MSST).101,102 Seated step tests have been used to describe the fitness level of community-dwelling older adults, particularly when the target population has been expected to include older people who are frail or disabled.102,103

The SST is a 4-stage, submaximal exercise test that is conducted with the participant seated.40,103 The test consists of alternate placement of the feet on the edge of a step at a stepping rate of one step per second. The test begins at a step height of 15.2 cm (6 in) (stage 1), increasing step height for stages 2 (30.5-cm [12-in] step) and 3 (45.7-cm [18-in] step), and the addition of alternating arm movements in stage 4. Stages of the SST represent workloads of 2.3, 2.9, 3.5, and 3.9 metabolic equivalents (METs), respectively, roughly equivalent to the energy requirements of walking at the MET level in miles per hour (eg, 2.3 METs is equivalent to 2.3 mph).40

During the SST, heart rate and blood pressure are monitored initially and at 2 and 5 minutes of stepping. A heart rate under 75% of the age predicted maximum at 5 minutes during any stage means the person can continue the test into the next stage.40 Seated step test stages were adjusted to 3 minutes each for the Women's Health and Aging Study.103 The investigators used the American College of Sports Medicine (ACSM) guidelines for safety in exercise testing of older individuals to determine who could start the test and who should stop.

The MSST involves only stage 1 of the SST, with heart rate and blood pressure monitored initially, at 2 and 5 minutes of stepping, and 2 minutes after stepping. Based on the person's ability to initiate and complete the test and on his or her response to the exercise stress, the older individual can be classified as "unhealthy," "deconditioned," or "conditioned."101,102 The advantage of SSTs for older people, particularly those with frailty, is that exercise stress testing is not dependent on the individual's ability to walk.103

In the study done by Simonsick and Fried,103 44.1% of 1,002 older participants were excluded from performing the SST during physical examination because of cardiac findings or leg weakness as determined by the nurse evaluator. An additional 2.3% excluded themselves because they believed that they could not complete the test. The most common reason reported for stopping the SST was the older person reporting an inability to continue (eg, the test was symptom limited).103

The step tests of exercise tolerance have limited reported measurement characteristics. No form of reliability has been reported for the SST. Test-retest reliability of data obtained with the MSST indicated 67% agreement for the classification of older adults who had not changed physical function (eg, no change in PPT percentile rank) over a period of 1 to 6 months.102 The percentage of agreement is modest at best; however, the long time between measurements and the basis for determining no change (comprehensive physical function, PPT score) may have contributed to the low reliability. Percentage of agreement of the sample is often not applicable for the population. Therapists selecting the MSST to use would need to determine the reliability for their sample. Despite the incomplete psychometric properties (particularly reliability), we believe that the SST and MSST are unique among physical performance measures and that the measures contribute useful information to our understanding of physical function of older adults.

Construct validity for the SST and MSST has been determined by comparison with self-reported and performance-based measures of functional status, respectively.102,103 For the SST, of the women who were most disabled, 61% completed stage 1 and 13% completed stage 2 of the test. In contrast, among the women with moderate disability or no disability, 75% completed stage 1 and 31% completed stage 2.103 For the MSST, comparing the individuals classified as "unhealthy" with those classified as "deconditioned" or "conditioned," a greater proportion of those classified as "unhealthy" demonstrated physical performance below the 25th percentile of the PPT (88% versus 46%). Likewise, fewer of the individuals classified as "unhealthy" demonstrated performance above the 50th percentile of the PPT (9%) compared with the proportion of those classified as "deconditioned" or "conditioned" (33%).102

Six-Minute Walk Test.41
The Six-Minute Walk Test is used as a measure of exercise tolerance and endurance for community dwelling older adults.41 The Six-Minute Walk Test has been used to describe and monitor an individual's endurance level.41

The test is easy to administer and consists of measuring the distance a person can walk in 6 minutes. The Six-Minute Walk Test is dependent on an individual's ability to ambulate. Lack of standardization in administering the test has been the major criticism of the test.104 Mungall and Hainsworth105 recommended completion of the Six-Minute Walk Test 3 times, with the third test distance recorded for the most accurate representation of the individual's fitness level.

One-week test-retest reliability (Pearson r=.95) of data obtained with the test was determined in a sample of 86 older adults without significant disease (eg, some subjects had chronic conditions such as arthritis and hypertension, but the subjects had no life-threatening or disabling conditions such as cardiac dysfunction or cerebrovascular accident).106 Thus, reliability of data obtained with the Six-Minute Walk Test has been demonstrated on a representative sample of community-dwelling older people. Validity was demonstrated by comparing the measurements obtained with the Six-Minute Walk Test with those obtained with cycle ergometer exercise testing (Spearman rho, r=.58) and with functional classification (Spearman rho, r= .50–.60).107 The distance covered during the Six-Minute Walk Test was different for inactive older individuals living in retirement homes (mean distance covered=274.6 m [901 ft]) compared with active older individuals attending community centers (mean distance covered=496.5 m [1,629 ft]), thus demonstrating known-groups validity.106

Fitness for Activity—Self-Report Measures

Physical Activity Scale for the Elderly (PASE).108
The PASE is a self-report measure designed to assess the level of physical activity in the past week in older people. Three components of physical activity (ie, leisure, occupational, and household) are assessed using the PASE.

The PASE can be administered over the phone or by personal interview, or it can be self-administered as a mail survey. Total PASE scores for physical activity are computed by multiplying activity weights (item weights based on motion sensor counts, physical activity diary records, and global activity ratings108) by activity frequencies. Scoring can be complex, but an administration and scoring manual can be obtained for a nominal fee. The PASE is copyright protected, so permission must be granted from the New England Research Institute prior to using the scale.

Test-retest reliability for the PASE was determined by administering the test to 254 men and women who were at least 65 years of age 2 times, 2 to 7 weeks apart. Pearson correlation coefficients were .68 for telephone administration and .84 for mail administration.8 Concurrent validity was determined in 222 men and women who were at least 65 years of age by comparison with various health-related variables such as perceived health, SIP total score, heart rate, grip strength, balance, and dominant leg strength. Pearson correlation coefficients ranged from –.13 to –.42, with the strongest correlation associated with the SIP total score.108 In addition, the PASE was shown to correlate with energy expenditure (r=.58).109 The PASE has also been used for patients. In a sample of people with knee pain (N=471, mean age=71.4 years), validity was demonstrated by correlation with perceived difficulty with physical functioning (r=.35), the Six-Minute Walk Test (r=.35), and knee strength (r=.41).110

Scenarios: Applying the Selection of Measures

The following 2 examples are an illustration of the selection of a set of measures for assessment of older adults under different conditions.

Scenario 1: community-dwelling, well older adults referred for assessment and intervention, including general physical conditioning and optimal physical function for independent living.
The target population is primarily well older adults, meaning no current acute health problems. The older people live independently in the community and typically would be expected to have no major difficulty with performance of ADL. The concern is for promoting health and identifying subtle impairments for which intervention may improve performance, possibly slowing the rate of decline of physical function.

For comprehensive physical performance, the PPT is, in our view, a good measure relative to context (previously used for a similar population), practicality (10 minutes to administer, requires little experience or training to administer, minimal equipment required to implement), and psychometric properties (limited reliability, construct and predictive validity for morbidity and mortality). The PPT also provides a benchmark (percentile ranking) for the person's performance compared with other community-dwelling older adults. In addition to the performance-based measure, a self-report measure could provide information about the person's perception of physical performance. The brevity and ease of scoring and the population previously studied with the FSQ make the questionnaire an ideal choice, in our opinion. The FSQ has internal consistency reliability for all subscales, construct and convergent validity for health measures, and predictive validity for mortality.

A mobility and balance measure, in our opinion, may not be useful, given the circumstances, because we would not expect the group to demonstrate notable problems with balance or walking. The majority of the individuals would likely attain the optimum score (ceiling effect) on a mobility or balance measure; thus, in our view, little information would be gained for the time and effort of conducting the measure. Some measure of mobility, we believe, may still have a role in assessment. Gait speed is an easily obtained measure of mobility and balance. The continuous nature of the measurement of gait speed provides the possibility of recognizing minimal clinically important changes, which may be an early physical performance sign of decline. Gait speed has been used extensively in studies of well older adults and older adults with frailty, thus, extensive comparative data exist.3,34,87,90,91,100 We would choose gait speed over the TUG because of the ability to compare scores with a broad range of previous data may better enable the clinician to determine the clinical meaningfulness of specific findings.

We believe that endurance is important to assess in this population of older people without current acute problems living in the community for 2 reasons: (1) we believe there is a relationship between fatigue and the decline of physical function and loss of independence,111,112 and (2) we contend that a relationship exists among endurance exercise, health status, and the reduction of risk factors for cardiovascular disorders.113,114 Because balance and mobility are not a problem for this target population, the Six-Minute Walk Test is, in our view, a good option for assessing fitness for activity. The MSST can be used to identify an individual as "conditioned," "deconditioned," or "unhealthy," but we believe that it would be advantageous to use the Six-Minute Walk Test to provide a continuous measure of endurance.

Because of the positive relationship between physical activity and health status in older people,115 we believe it would be important to gather information about the person's physical activity level. Use of the PASE to acquire information about physical activity practices of older people, including specific activities and frequency and intensity of activity, would yield information not provided by the Six-Minute Walk Test. The physical activity information (PASE) in combination with the measure of fitness (Six-Minute Walk Test) can be used to design a regular exercise program for the improvement of health status of the individual.

Scenario 2: physical performance assessment of community-dwelling older adults with frailty participating in comprehensive geriatric assessment.
Comprehensive geriatric assessment process involves evaluation by members of a multidisciplinary, consultative team, making recommendations and establishing a plan of care to be followed through by the referring physician, the older person, and the caregiver. The assessments are performed largely to screen for older people who are at risk for health problems and to determine independent living status, which could be used to determine clinical and social service needs. Physical therapy assessment in this scenario is often used to identify older people who are at risk for loss of independence in community-dwelling status. Particularly of interest would be the identification of the older adults who are at risk for health problems and who would benefit from intervention to optimize physical function and to decrease the level of assistance necessary for community living.

Comprehensive geriatric assessment teams often focus on assessing function. Several team members, therefore, may acquire information about functional status, mobility, and fall history from the patient's report or reports of family members or caregivers. We believe that the physical therapist is uniquely suited to provide information about performance in physical function of the older person. Therefore, selection of assessment instruments could possibly be restricted to performance-based-measures. The PPT, a comprehensive performance-based measure of ADL, is, in our opinion, an obvious choice. Besides being one of a very few comprehensive performance-based measures, the wide range of attainable scores and prior testing among community-dwelling older adults illustrates the usefulness of the measure for screening the physical performance of community-dwelling older adults with frailty. The PPT is practical for this frail population given the brevity (10 minutes to administer) and use of familiar tasks in testing. Particularly desirable among the psychometric properties are the predictive validity for institutionalization or death, providing information that is useful in planning community and health services.

The GARS-M or the Berg Balance Scale could be useful for assessment of balance. In addition to the established measurement characteristics (eg, sensitivity and specificity; cutoff scores for risk of recurrent falls or benefit from physical therapy to reduce risk), the physical therapist may derive insight into impairments that need to be addressed through intervention from the observed performance (although these measures have not been validated for this use). If the geriatric assessment team perceives a primary role for screening and predicting performance that places the older person at risk for falling and loss of independence, then the GARS-M may be the better selection because of the demonstrated validity for identifying risk of recurrent falls.3 Likewise, the Berg Balance Scale may be the better choice if the primary role of the team is identifying who physical therapists perceive may improve their performance with physical therapy for balance and mobility training based on the demonstrated validity (sensitivity=84% and specificity=78% for a cutoff score of 48) of the Berg Balance Scale for the clinician's professional opinion.5 Gait speed may provide a direct measure of mobility and a relative measure of balance.

Endurance in this population may indicate the physiological capacity for participating in physical therapy and provide some indication of the capacity of the older individual to perform ADL. By identifying the fitness category, the MSST provides critical information regarding the response to exercise and the activity. Among elderly people with frailty, we suggest that an emphasis on identifying the category of response ("unhealthy," "deconditioned," or "conditioned") provides more useful information about the person's cardiovascular risks (and physiological safety) for loss of independence and planning for intervention (eg, diagnostic studies of cardiovascular system performance) than the person's level of performance (distance walked) as indicated using the Six-Minute Walk Test. A person could walk 6 minutes with frequent rest periods, but have cardiac disease that would go unrecognized during the examination. Despite an impairment of mobility and balance, no one need be excluded from the assessment, as the MSST of endurance is not dependent on ambulatory status.


    Summary
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 
The high cost (time, resources, and effort) of measuring physical function of older adults necessitates a return on the investment of clients, caregivers, and clinicians involved in the process. If the clinician's intent is to make meaningful statements about physical performance of older adults, the careful use of measures of physical function is necessary. We assume that effective interventions for the reduction of morbidity, currently a major individual and societal problem of people living longer,79,116 improve with the accuracy of evaluating physical function, making a prognosis of response to physical therapy or risk for decline, and directing intervention to specific problems of those identified to be at risk. Thoughtful selection of measures (appropriate for the target population, practical to administer, and psychometrically sound) for assessing physical function of community-dwelling older adults is an essential step along a path toward evidence-based practice in geriatric physical therapy.


    Footnotes
 
Both authors provided concept/project design and writing.

* The Standards for Tests and Measurements in Physical Therapy Practice15 were developed by the American Physical Therapy Association's Task Force on Standards for Measurement in Physical Therapy in 1991 and referred to in the Guide to Physical Therapist Practice (rev ed, 1999) for clinical issues of measurement by physical therapists. The specific standards highlighted in the Standards of Tests and Measurements in Physical Therapy Practice, particularly those recommended for the tertiary purveyor (ie, teacher) and user of measures, further define issues within the major areas of concern. Back


    References
 Top
 Abstract
 Introduction
 Selecting Measures for Review
 Selecting a Measure for...
 Selecting a Measure:...
 Selecting a Measure:...
 Selecting a Measure:...
 Review of Selected Measures
 Summary
 References
 

  1. Winograd CH. Targeting strategies: an overview of criteria and outcomes. J Am Geriatr Soc.1991; 39:25S–35S.[Medline]
  2. Rubenstein LZ. Documenting impacts of geriatric consultation. J Am Geriatr Soc.1987; 35:829–830.[Web of Science][Medline]
  3. VanSwearingen JM, Paschal KA, Bonino P, Chen T. Assessing recurrent fall risk of community-dwelling, frail older veterans using specific tests of mobility and the Physical Performance Test of function. J Gerontol A Biol Sci Med Sci.1998; 53:M457–M464.[Abstract]
  4. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. Boston, Mass: Little, Brown and Company,1991 .
  5. Harada N, Chiu V, Damron-Rodriguez J, et al. Screening for balance and mobility impairment in elderly individuals living in residential care facilities. Phys Ther.1995; 75:462–469.[Abstract/Free Full Text]
  6. Rubin CD, Sizemore MT, Loftis PA, Loret de Mola N. A randomized, controlled trial outpatient geriatric evaluation and management in a large public hospital. J Am Geriatr Soc.1993; 41:1023–1028.[Web of Science][Medline]
  7. Rubenstein LZ, Josephson KR, Harker JO, et al. The Sepulveda GEU Study revisited: long-term outcomes, use of services, and costs. Aging.1995; 7:212–217.[Medline]
  8. Wieland D, Rubenstein LZ. What do we know about patient targeting in geriatric evaluation and management (GEM) programs? Aging.1996; 8:297–310.[Medline]
  9. Reuben DB, Borok GM, Wolde-Tsadik G, et al. A randomized trial of comprehensive geriatric assessment in the care of hospitalized patients. N Engl J Med.1995; 332:1345–1350.[Abstract/Free Full Text]
  10. Stuck AE, Siu AL, Weiland D, et al. Comprehensive geriatric assessment: a meta-analysis of controlled trials. Lancet.1993; 342:1032–1036.[Web of Science][Medline]
  11. Hing E. Use of nursing homes by the elderly: preliminary data from the 1985 National Nursing Home Survey. In: Advance Data From Vital and Health Statistic. Hyattsville, Md: Public Health Service;1987 . Dept of Health and Human Services Publication No. (PHS) 87–1250.
  12. International Classification of Impairments, Disabilities, and Handicaps. Geneva, Switzerland: World Health Organization,1980 .
  13. Smyth KA, Ferris SH, Fox P, et al. Measurement choices in multi-site studies of outcomes in dementia. Alzheimer Disease and Associated Disorders.1997; 11:30–44.[Medline]
  14. Schulz R, Williamson GM. The measurement of caregiver outcomes in Alzheimer disease research. Alzheimer Disease and Associated Disorders.1997; 11:117–124.[Web of Science][Medline]
  15. Task Force on Standards for Measurement in Physical Therapy. Standards for tests and measurements in physical therapy practice. Phys Ther.1991; 71:589–622.[Abstract/Free Full Text]
  16. Feinstein AR, Josephy BR, Wells CK. Scientific and clinical problems in indexes of functional disability. Arch Intern Med.1986; 105:413–420.
  17. Gifford DR, Cummings JL. Evaluating dementia screening tests: methodologic standards to rate their performance. Neurology.1999; 52:224–227.[Free Full Text]
  18. Mahoney F, Barthel D. Functional evaluation: the Barthel Index. Md State Med J.1965; 14:61–65.[Medline]
  19. Reuben DB, Siu AL. An objective measure of physical function of elderly outpatients: the Physical Performance Test. J Am Geriatr Soc.1990; 38:1105–1112.[Web of Science][Medline]
  20. Reuben DB, Valle LA, Hays RD, Siu AL. Measuring physical function in community-dwelling older persons: a comparison of self-administered, interviewer-administered, and performance-based measures. J Am Geriatr Soc.1995; 43:17–23.[Web of Science][Medline]
  21. Guralnik JM, Branch LG, Cummings SR, Curb JD. Physical performance measures in aging research. J Gerontol.1989; 44:M141–M146.
  22. Podsiadlo D, Richardson S. The Timed "Up & Go": a test of basic functional mobility for frail elderly persons. J Am Geriatr Soc.1991; 39:142–148.[Web of Science][Medline]
  23. VanSwearingen JM. Predicting falls [letter to the editor]. Phys The. In press.
  24. Feinstein AR. Clinimetrics. New Haven, Conn: Yale University Press,1987 .
  25. Jette AM. Measuring subjective clinical outcomes. Phys Ther.1989; 69:580–584.[Abstract/Free Full Text]
  26. Ware JE, Sherbourne CD. The MOS 36-Item Short-Form Health Survey (SF-36), I: conceptual framework and item selection. Med Care.1992; 30:473–483.[Web of Science][Medline]
  27. Bergner M, Bobbitt RA, Pollard WE, et al. The Sickness Impact Profile: validation of a health status measure. Med Care.1976; 14:57–67.[Web of Science][Medline]
  28. Duncan PW, Weiner DK, Chandler J, Studenski S. Functional reach: a new clinical measure of balance. J Gerontol.1990; 45:M192–M197.
  29. Imms F, Edholm O. Studies of gait and mobility in the elderly. Age Aging.1981; 10:147–156.[Abstract/Free Full Text]
  30. Cerny K. A clinical method of quantitative gait analysis. Phys Ther.1983; 63:1125–1126.[Abstract/Free Full Text]
  31. Csuka M, McCarthy DJ. Simple method of measurement of lower extremity muscle strength. Am J Med.1985; 78:77–81.[Web of Science][Medline]
  32. Buchner DM, Hornbrook MC, Kutner NG, et al. Development of the common data base for the FICSIT trials. J Am Geriatr Soc.1993; 41:297–308.[Web of Science][Medline]
  33. Berg KO, Wood-Dauphinee SL, Williams JI, Gayton D. Measuring balance in the elderly: preliminary development of an instrument. Physiotherapy Canada.1989; 41:304–311.
  34. VanSwearingen JM, Paschal KA, Bonino P, Yang JF. The Modified Gait Abnormality Rating Scale and recognizing recurrent fall risk of community-dwelling, frail older veterans. Phys Ther.1996; 76:994–1002.[Abstract/Free Full Text]
  35. Tinetti ME. Performance-oriented assessment of mobility problems in elderly patients. J Am Geriatr Soc.1986; 34:119–126.[Web of Science][Medline]
  36. Guralnik JM, Simonsick EM, Ferrucci L, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol.1994; 49:M85–M94.
  37. Tinetti ME, Speechley M, Ginter SF. Risk factors for falls among elderly persons living in the community. N Engl J Med.1988; 319:1701–1707.[Abstract]
  38. Tinetti ME, Ginter SF. Identifying mobility dysfunctions in elderly patients: standard neuromuscular examination or direct assessment? JAMA.1988; 259:1190–1193.[Abstract/Free Full Text]
  39. Kane RA, Kane RL. Assessing the Elderly: A Guide to Practical Measurement. Lexington, Mass: Lexington Books,1981 .
  40. Smith EL, Gilligan C. Physical activity prescription for the older adult. The Physician and Sportsmedicine.1983; 11:91–101.
  41. Butland RJ, Pang J, Gross ER, et al. Two-, six-, and 12-minute walking tests in respiratory disease. Br Med J (Clin Res Ed).1982; 284:1607–1608.[Medline]
  42. Tager IB, Swanson A, Satariano WA. Reliability of physical performance and self-reported functional measures in an older population. J Gerontol.1998; 53:M295–M300.[Web of Science]
  43. Crawford SL, Jette AM, Tennstedt SL. Test-retest reliability of self-reported disability measures in older adults. J Am Geriatric Soc.1997; 45:338–341.[Web of Science][Medline]
  44. Rozzini R, Frisoni GB, Bianchetti A, et al. Physical Performance Test and activities of daily living scales in the assessment of health status in elderly people. J Am Geriatr Soc.1993; 41:1109–1113.[Web of Science][Medline]
  45. Rozzini R, Frisoni GB, Ferrucci L, et al. The effect of chronic disease on physical function: comparison between activities of daily living scales and the Physical Performance Test. Age Aging.1997; 26:281–287.[Abstract/Free Full Text]
  46. Jette AM, Davies AR, Cleary PD, et al. The Functional Status Questionnaire: reliability and validity when used in primary care. J Gen Intern Med.1986; 1:143–149.[Web of Science][Medline]
  47. Lord SR, Lloyd D, Nirui M, et al. The effect of exercise on gait patterns in older women: a randomized controlled trial. J Gerontol.1996; 51:M64–M70.
  48. Kirshner B, Guyatt GH. A methodological framework for assessing health indices. J Chronic Dis.1985; 38:27–36.[Web of Science][Medline]
  49. Guide to Physical Therapist Practice. 2nd ed. Phys The.2001; 81:9–744.
  50. Rothstein JM, Echternach JL. Primer on Measurement: An Introductory Guide to Measurement Issues. Alexandria, Va: American Physical Therapy Association,1993 .
  51. Nunnally JC. Psychometric Theory. New York, NY: McGraw-Hill Book Co,1967 :162–210.
  52. Guyatt GH, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis.1987; 40:171–178.[Web of Science][Medline]
  53. Guyatt GH, Deyo RA, Charlson M, et al. Responsiveness and validity in health status measurement: a clarification. J Clin Epidemiol.1989; 42:403–408.[Web of Science][Medline]
  54. Jones CJ, Rikli RE, Beam W. A 30-s chair-stand test as a measure of lower body strength in community-residing older adults. Res Q Exerc Sport.1999; 70:113–119.[Web of Science][Medline]
  55. Gillespie LD, Gillespie WJ, Cumming R, et al. Interventions for preventing falls in the elderly. Cochrane Database Syst Rev.2000; (2):CD000340.
  56. Riddle DL, Stratford PW. Interpreting validity indexes for diagnostic tests: an illustration using the Berg Balance Test. Phys Ther.1999; 79:939–948.[Abstract/Free Full Text]
  57. Berg KO, Wood-Dauphinee SL, Williams JI, Maki BE. Measuring balance in the elderly: validation of an instrument. Can J Public Health.1992; 83(suppl 2):S7–S11.
  58. Duncan PW, Studenski S, Chandler J, Prescott B. Functional reach: predictive validity in a sample of elderly male veterans. J Gerontol.1992; 47:M93–M98.
  59. Weiss N. Clinical Epidemiology: The Study of the Outcome of Illness. New York, NY: Oxford University Press,1986 .
  60. Beck JR, Shultz EK. The use of relative operating characteristic (ROC) curves in test performance. Arch Pathol Lab Med.1986; 110:13–20.[Web of Science][Medline]
  61. Reuben DB, Siu AL, Kimpau S. The predictive validity of self-report and performance-based measures of function and health. J Gerontol.1992; 47:M106–M110.
  62. Reuben DB, Rubenstein LV, Hirsch S, Hays R. Value of functional status as a predictor of mortality: results of a prospective study. Am J Med.1992; 93:663–669.[Web of Science][Medline]
  63. Guralnik JM, Ferrucci L, Simonsick EM, et al. Lower extremity function in persons over the age of 70 years as a predictor of subsequent disability. N Engl J Med.1995; 332:556–561.[Abstract/Free Full Text]
  64. Guralnik JM, Ferrucci L, Pieper C, et al. Lower extremity function and subsequent disability: consistency across studies, predictive models, and value of gait speed alone compared with the short physical performance battery. J Gerontol.2000; 55:M221–M231.
  65. Stratford PW, Binkley JM, Riddle DL. Health status measures: strategies and analytic methods for assessing change scores. Phys Ther.1996; 76:1109–1122.[Abstract/Free Full Text]
  66. Stadnyk K, Calder J, Rockwood K. Testing the measurement properties of the Short Form-36 Health Survey in a frail elderly population. J Clin Epidemiol.1998; 51:827–835.[Web of Science][Medline]
  67. Hayes V, Morris J, Wolfe C, Morgan M. The SF-36 health survey questionnaire: is it suitable for use with older adults? Age Aging.1995; 24:120–125.[Abstract/Free Full Text]
  68. Brazier JE, Harper R, Jones NM, et al. Validating the SF-36 health survey questionnaire: new outcome measure for primary care. Br Med J.1992; 305:160–164.[Abstract/Free Full Text]
  69. Andersen EM, Bowley N, Rothenberg BM, et al. Test-retest performance of a mailed version of the Medical Outcomes Study 36-Item Short-Form Health Survey among older adults. Med Care.1996; 34:1165–1170.[Web of Science][Medline]
  70. Wolinsky FD, Wan GJ, Tierney WM. Changes in the SF-36 in 12 months in a clinical sample of disadvantaged older adults. Med Care.1998; 36:1589–1598.[Web of Science][Medline]
  71. Rosow I, Breslau N. A Guttman Health Scale for the aged. J Gerontol.1966; 21:556–559.
  72. Smith LA, Branch LG, Scherr PA, et al. Short-term variability of measures of physical function in older people. J Am Geriatr Soc.1990; 38:993–998.[Web of Science][Medline]
  73. Pollard WE, Bobbitt RA, Bergner M, et al. The Sickness Impact Profile: reliability of a health status measure. Med Care.1976; 14:146–155.[Web of Science][Medline]
  74. Iorio R, Pensati P, Botta S, et al. Side effects of alpha-interferon therapy and impact on health-related quality of life in children with chronic viral hepatitis. Ped Infect Dis J.1997; 16:984–990.
  75. de Jong W, Kaptein AA, van der Schans CP, et al. Quality of life in patients with cystic fibrosis. Pediatr Pulmonol.1997; 23:95–100.[Web of Science][Medline]
  76. Corran TM, Farrell MJ, Helme RD, Gibson SJ. The classification of patients with chronic pain: age as a contributing factor. Clin J Pain.1997; 13:207–214.[Web of Science][Medline]
  77. Herlitz C, Dahlberg L. Causes of strain affecting relatives of Swedish oldest elderly: a population-based study. Scand J Caring Sci.1999; 13:109–115.[Web of Science][Medline]
  78. Andersen EM, Rothenberg BM, Kaplan RM. Performance of a self-administered mailed version of the Quality of Well-Being (QWB-SA) questionnaire among older adults. Med Care.1998; 36:1349–1360.[Web of Science][Medline]
  79. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care.1981; 19:787–805.[Web of Science][Medline]
  80. Gilson B, Gilson J, Bergner M, et al. The Sickness Impact Profile: development of an outcome measure of health care. Am J Public Health.1975; 65:1304–1310.[Free Full Text]
  81. Andersen EM, Patrick DL, Carter WB, Malmgren JA. Comparing the performance of health status measures for healthy older adults. J Am Geriatr Soc.1995; 43:1030–1034.[Web of Science][Medline]
  82. de Bruin AF, Diederiks JP, de Witte LP, et al. Assessing the responsiveness of a functional status measure: the Sickness Impact Profile. J Clin Epidemiol.1997; 50:529–540.[Web of Science][Medline]
  83. Berg KO, Maki BE, Williams JI, et al. Clinical and laboratory measures of postural balance in an elderly population. Arch Phys Med Rehabil.1992; 73:1073–1080.[Web of Science][Medline]
  84. Shumway-Cook A, Woollacott MH. Assessment and treatment of patients with postural disorders. In: Shumway-Cook A, Woollacott MH, eds. Motor Control: Theory and Practical Applications. Baltimore, Md: Williams & Wilkins,1995 :207–235.
  85. Weiner DK, Duncan PW, Chandler J, Studenski S. Functional Reach: a marker of physical frailty. J Am Geriatr Soc.1992; 40:203–207.[Web of Science][Medline]
  86. Wernick-Robinson M, Krebs DE, Giorgetti M. Functional reach: does it really measure dynamic balance? Arch Phys Med Rehabil.1999; 80:262–269.[Web of Science][Medline]
  87. Himann J, Cunningham D, Rechnitzer P, Paterson D. Age-related changes in speeds of walking. Med Sci Sports Exerc.1988; 20:161–166.
  88. Gabell A, Nayak USL. The effect of age on variability in gait. J Gerontol.1984; 39:662–666.
  89. Ostrosky K, VanSwearingen JM, Burdett R, Gee Z. A comparison of gait characteristics in young and old subjects. Phys Ther.1994; 74:637–646.[Abstract/Free Full Text]
  90. Wolfson L, Whipple R, Amerman P, Tobin JN. Gait assessment in the elderly: a gait abnormality rating scale and its relation to falls. J Gerontol.1990; 45:M12–M19.
  91. Blanke DJ, Hageman P. Comparison of gait of young and elderly men. Phys Ther.1989; 69:144–148.[Abstract/Free Full Text]
  92. Jette AM, Jette DU, Ng J, et al. The Musculoskeletal Impairment (MSI) Study Group: Are performance-based measures sufficiently reliable for use in multicenter trials? J Gerontol.1999; 54:M3–M6.[Web of Science]
  93. Cohen JA. A coefficient of agreement for nominal scales. Educational and Psychological Measurement.1960; 20:37–46.[Medline]
  94. Robbins AS, Rubenstein LZ, Josephson KR, Schulman BL, et al. Predictors of falls among elderly people. Arch Intern Med.1989; 149:1628–1633.[Abstract/Free Full Text]
  95. Tinetti ME, Williams TF, Mayewski R. Fall risk index for elderly patients based on number of chronic disabilities. Am J Med.1986; 80:429–434.[Web of Science][Medline]
  96. Rossiter-Fornoff JE, Wolf SL, Wolfson LI, Buchner DM. A cross-sectional validation study of the FICSIT common data base static balance measures. J Gerontol A Biol Sci Med Sci.1995; 50:M291–M297.[Abstract]
  97. Nevitt MC, Cummings SR, Kidd S, et al. Risk factors for recurrent nonsyncopal falls. JAMA.1989; 261:2663–2668.[Abstract/Free Full Text]
  98. Shumway-Cook A, Brauer S, Woollacott MH. Predicting the probability for falls in community-dwelling older adults using the Timed Up & Go Test. Phys Ther.2000; 80:896–903.[Abstract/Free Full Text]
  99. Ikegami N. Functional assessment and its place in health care. N Engl J Med.1995; 332:598–599.[Free Full Text]
  100. Hageman P, Blanke DJ. Comparison of gait of young women and elderly women. Phys Ther.1986; 66:1382–1387.[Abstract/Free Full Text]
  101. Faub J. Cardiovascular Endurance and Physical Function of Community-Dwelling Frail Older Veterans [thesis/dissertation]. Pittsburgh, Pa: University of Pittsburgh,1994 .
  102. VanSwearingen JM, Paschal KA. The relationship between fitness and clinical measures of physical performance among community-dwelling frail older veterans [abstract]. Phys Ther.1996; 76:72S.
  103. Simonsick E, Fried LP. Exercise tolerance and body composition. In: Guralnik JM, Fried LP, Simonsick EM, et al, eds. The Women's Health and Aging Study: Health and Social Characteristics of Older Women With Disabilit. Bethesda, Md: National Institute on Aging;1995 :106–117. NIH Publication No. 95-4009.
  104. Guyatt GH, Pugsley SO, Sullivan MJ, et al. Effect of encouragement on walking test performance. Thorax.1984; 39:818–822.[Abstract/Free Full Text]
  105. Mungall I, Hainsworth R. Assessment of respiratory function in patients with chronic obstructive airways disease. Thorax.1979; 34:254–258.[Abstract/Free Full Text]
  106. Harada ND, Chiu V, Stewart AL. Mobility-related function in older adults: assessment with a 6-minute walk test. Arch Phys Med Rehabil.1999; 80:837–841.[Web of Science][Medline]
  107. Guyatt GH, Thompson PJ, Berman LB, et al. How should we measure function in patients with chronic heart and lung disease? J Chronic Dis.1985; 38:517–524.[Web of Science][Medline]
  108. Washburn RA, Smith KW, Jette AM, Janney CA. The Physical Activity Scale for the Elderly (PASE): development and evaluation. J Clin Epidemiol.1993; 46:153–162.[Web of Science][Medline]
  109. Schuit AJ, Schouten EG, Westerterp KR, Saris WHM. Validity of the Physical Activity Scale for the Elderly (PASE): according to energy expenditure assessed by the doubly labeled water method. J Clin Epidemiol.1997; 50:541–546.[Web of Science][Medline]
  110. Martin K, Rejeski WJ, Miller M, et al. Validation of the PASE in older adults with knee pain and physical disability. Med Sci Sports Exerc.1999; 31:627–633.
  111. Wheat ME. Exercise in the elderly. Western J Med.1987; 147:477–480.[Web of Science][Medline]
  112. Shephard RJ. Exercise and aging: extending independence in older adults. Geriatrics.1993; 48:61–64.[Web of Science][Medline]
  113. Shephard RJ. The cardiovascular benefits of exercise in the elderly. Topics in Geriatric Rehabilitation.1985; 1:1–10.
  114. Williams MA. Cardiovascular risk-factor reduction in elderly patients with cardiac disease. Phys Ther.1996; 76:469–480.[Abstract/Free Full Text]
  115. Mazzeo RS, Cavanagh R, Evans WJ, et al. American College of Sports Medicine Position Stand: Exercise and physical activity for older adults. Med Sci Sports Exerc.1998; 30:1008.
  116. Gill TM, Williams CS, Tinetti ME. Assessing risk for the onset of functional dependence among older adults: the role of physical performance. J Am Geriatr Soc.1995; 43:603–609.[Web of Science][Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
ptjournalHome page
P. Q McGinnis, L. M Hack, K. Nixon-Cave, and S. L Michlovitz
Factors That Influence the Clinical Decision Making of Physical Therapists in Choosing a Balance Assessment Approach
Physical Therapy, March 1, 2009; 89(3): 233 - 247.
[Abstract] [Full Text] [PDF]


Home page
Nephrol Dial TransplantHome page
S. A. Cook, H. MacLaughlin, and I. C. Macdougall
A structured weight management programme can achieve improved functional ability and significant weight loss in obese patients with chronic kidney disease
Nephrol. Dial. Transplant., January 1, 2008; 23(1): 263 - 268.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
S. R Piva, E. A Goodnite, K. Azuma, J. D Woollard, B. H Goodpaster, M. C. Wasko, and G K. Fitzgerald
Neuromuscular Electrical Stimulation and Volitional Exercise for Individuals With Rheumatoid Arthritis: A Multiple-Patient Case Report
Physical Therapy, August 1, 2007; 87(8): 1064 - 1077.
[Abstract] [Full Text] [PDF]


Home page
Clin RehabilHome page
C. Sherrington and S. R Lord
Reliability of simple portable tests of physical performance in older people after hip fracture
Clinical Rehabilitation, May 1, 2005; 19(5): 496 - 504.
[Abstract] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by VanSwearingen, J. M
Right arrow Articles by Brach, J. S
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by VanSwearingen, J. M
Right arrow Articles by Brach, J. S
Related Collections
Right arrow Perspectives
Right arrow Tests and Measurements
Right arrow Geriatrics: Other
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2001 by the American Physical Therapy Association.