|
|
||||||||
Research Reports |
LI Strand, PT, PhD, is Associate Professor, Section of Physiotherapy Science, Department of Public Health and Primary Health Care, Faculty of Medicine, University of Bergen, Ulriksdal 8c, Bergen, Norway (liv.strand{at}isf.uib.no). Address all correspondence to Dr Strand
R Moe-Nilssen, PT, PhD, is Associate Professor, Section of Physiotherapy Science, Department of Public Health and Primary Health Care, Faculty of Medicine, University of Bergen
AE Ljunggren, PT, PhD, is Professor, Section of Physiotherapy Science, Department of Public Health and Primary Health Care, Faculty of Medicine, University of Bergen
Submitted July 20, 2001;
Accepted July 6, 2002
| Abstract |
|---|
Key Words: Activity limitation Back pain Physical performance Responsiveness Validity
| Introduction |
|---|
|
|
|---|
A multitude of self-report questionnaires have been developed to assess pain and daily functioning in patients with back problems.58 Pain is a symptom that depends on the response of the person experiencing it9 and can be assessed by self-report measures. Such measures are also important because they indicate how patients perceive their daily functioning, and the measures can be used to examine perceived change over time. Self-report measurements also are simple to obtain.
Observation, however, is often considered the most replicable method of assessing functional performance,10,11 even though data for this assertion often are lacking. Low to moderate agreement often has been demonstrated between self-reported and observed disability.7,1215 Salén et al7 demonstrated a moderate correlation (r=.48) between patients' self-reported difficulty in performing various daily tasks, as assessed by the Disability Rating Index (DRI), and observers' assessments of actual performance. After the patients had traversed an obstacle course, however, the correlation increased substantially to r=.78. Disability ratings derived from a questionnaire have been found to be higher (worse) than those derived from observation.15 Among 24% of all subjects in a health survey who reported at least one disability, no disability was registered by an external tester who observed the tasks being performed. The correspondence among questionnaire, interview, and clinical examination was addressed in another study.14 Among those respondents who reported no previous low back trouble on a Nordic questionnaire for the low back, only 63% gave the same report in a personal interview, and they were found not to have low back trouble in a blinded clinical examination. As there may be a mismatch between how patients believe they function and how they function as observed by others, we contend that self-report measures should be supplemented with observational methods to guide treatment and to register change in physical performance over time.
The impact of back pain on physical performance may be classified according to the World Health Organization's International Classification of Functioning and Disability (ICF)16 into dimensions of impairment, activity (limitation), and participation (restriction). Traditional physical tests tend to address impairments.1719 Impairment measures such as those of postural aberrations, decreased muscle force, and range of motion may not be good indicators of musculoskeletal dysfunction20 and disability.21,22
The need for developing appropriate tools for measuring mobility and activities of daily living was recently characterized as a priority for research by an international task force on back pain.23 Timed tests of activities such as walking, the sit-to-stand task, and repeated trunk flexion have been examined in patients with back pain and shown to have what we consider acceptable reliability, to be able to discriminate between people with and without back problems, and to be sensitive to change over time.2426 The ability to discriminate between people with and without back pain also has been demonstrated in tests of lifting capacity.27,28 Decreased mobility of the trunk often is considered a manifestation of back pain,2,29,30 and activities such as bending, twisting, stooping, crouching, lifting, dressing, and picking up objects often are limited.2,31,32 We believe that performance tests used in patients with low back pain should be useful to elucidate such key aspects of functioning. To relate to daily tasks, we believe we probably need a variety of tests.
Five physical performance tests of compound activities were part of a test battery in a randomized controlled study of patients with long-lasting musculoskeletal pain.33 The tests were presumed by the authors to be useful to assess activity limitations in patients with back pain because all the tests required mobility of the trunk. They were the Sock Test,34 the Pick-up Test,35 the Roll-up Test,36 the Fingertip-to-Floor Test,37 and the Lift Test.38 Discriminate validity and responsiveness to important change are known for each of the tests.4
A simple physical performance test, in our view, may not be sufficient to characterize physical functioning for all individuals with back problems. Although some activities cause pain in some people, they relieve pain in others.39 People can be considered, we believe, more disabled if performance is limited in several activities rather than in one activity. This viewpoint is consistent with the findings of Waddell et al22 and Thomas et al,40 who showed that the sum of the scores of several physical tests could be better to discriminate between people with and without back pain than the use of separate tests. Provided that the tests measure a common construct of physical performance, the sum of the scores of several tests might offer a more comprehensive measure of trunk mobility than a score of a single test. The purpose of our study was to develop a condition-specific, simple, feasible, and clinically useful outcome measure of physical performance in back pain.
Our hypotheses were:
| Method |
|---|
|
|
|---|
1 year) because of musculoskeletal conditions (
=3.3 months, SD=2.0).33 Exclusion criteria were pregnancy, substance abuse, and illnesses such as progressive nervous system disease, serious cardiac disease, and acute infection. The patients must also have sufficient knowledge of the Norwegian language to understand instructions and questionnaires. The patients' conditions were diagnosed by their general practitioner according to the Norwegian translation of the International Classification of Primary Care (ICPC).41 The study was accepted by the Regional Ethics Committee, was approved by the National Data Inspectorate, and was performed according to the Helsinki Declaration. All patients were to be followed for several years using data regarding sickness benefits from the National Health Insurance Register. A total of 1,776 patients were invited to participate, and 1,683 patients met the inclusion criteria. Only 1,071 patients responded, and 573 patients (53.5%) gave written consent. The patients were randomly assigned to 2 groups to receive different interventions. Because the comparison of treatment effectiveness was not the purpose of our study, the specifics of the interventions are not relevant in this report. Two hundred eighty-eight patients with different musculoskeletal conditions were examined using all 5 physical performance tests. They had a diagnosis of back pain with or without radiating pain (n=157), neck or shoulder pain (n=93), or unspecified musculoskeletal pain (n=38).
Of the total group of patients with musculoskeletal pain (n=573), 249 patients (47% men) had back pain. Only 157 patients were examined using all 5 physical performance tests when the study began. Thirty-six of the patients did not attend 1-year follow-up assessments and were considered dropouts, and no data on work status were available for an additional 7 patients at follow-up. Accordingly, responsiveness to change and discriminative ability of the sum of test scores examined at the 1-year follow-up were assessed in 114 patients with back pain. Baseline characteristics and comparisons of demographic and test variables of patients with back pain who were included (n=114) and were not included (n=135) in those analyses are presented in Table 1.
|
|
Pick-up Test.
The patient picks up a piece of paper from the floor in an optional way. Performance is assessed using a 4-point ordinal scale. Intertester reliability in 24 patients with musculoskeletal pain between 2 testers was satisfactory (K=.74), and aspects of validity were indicated.35
Roll-up Test.
The test is one of a large battery of tests developed by Sundsvold et al.36 The patient rolls up slowly, with arms relaxed, from a supine position to an extended sitting position. Performance is assessed on an 8-point ordinal scale. Intertester reliability in 21 patients with musculoskeletal pain between 2 testers was moderate (K=.59).4
Fingertip-to-Floor Test.
The patient stands on the floor, feet 10 cm apart, and is asked to bend forward with straight knees and try to touch the floor with the fingertips. The distance between the tip of the middle finger and the floor is recorded in centimeters. Intraclass correlation coefficients (1,1) for intertester and test-retest reliability in 73 patients with low back pain of a modified fingertip-to-floor measure have been reported as .95 and .98, respectively.42
Lift Test.
The patient is asked to repeat lifting a box containing a sandbag of 5 kg from the floor to a table and back to the floor for 1 minute.38 The number of lifts is recorded. Reliability of counting lifts per minute has not been reported.
The Back Performance Scale (BPS).
The basis for constructing the BPS sum scale was examined. The methods of score assignment for the Sock Test and the Pick-up Test are shown in Table 2. A score of 0 was considered a good performance with no signs of activity limitation, a score of 1 was considered a somewhat limited performance, a score of 2 was considered a rather distinct limitation of performance, and a score of 3 was considered a substantially limited performance, if performed at all. These criteria were used to rescale the other 3 tests (Tab. 2), and cutoff points of the separate tests were decided by (1) distribution of scores at the beginning of the study, (2) clinical experience concerning physical performance in people with back pain and those without back pain. The BPS sum score was calculated by adding the individual scores of the 5 tests. Ordinal test scores obtained at the beginning of the study of 157 patients with back pain are given in Table 3. The distribution of BPS sum scores at the beginning of the study is shown in Figure 1.
|
|
Scale construction.
We believed that the 5 performance tests share a theoretical construct of physical performance: sagittal mobility of the trunk in functional activities. Spearman correlations were calculated to examine the bivariate relationships among the tests, and between each test and the BPS sum score, assessed in patients with back pain at the beginning of the study (n=157). Internal consistency was examined by use of the Cronbach alpha, an overall correlation among the items. Streiner and Norman43 suggested that an alpha should be above .70 to demonstrate sufficient homogeneity of the items but should not be higher than .90, as it may suggest a high level of item redundancy. A high alpha may imply that some of the items are unnecessary and that the scale as a whole is too narrow in its scope to have much validity.43
Discriminative ability.
Because the tests of the BPS all require mobility of the trunk, we expected patients with back pain to demonstrate higher BPS scores (more limited performance) than either patients with neck or shoulder pain or patients with unspecified musculoskeletal pain. Because the distributions of BPS scores were not normal in the patients with neck or shoulder pain and in the patients with unspecified musculoskeletal pain (P<.05), we chose to analyze differences between patients with back pain and each of those 2 groups using the nonparametric Mann-Whitney test for independent groups. The level for rejecting the null hypothesis of no difference was P<.05.
Patients with a history of back pain who still received workers' compensation 1 year after rehabilitation were expected to demonstrate higher BPS scores (more limited performance) than those who had returned fully to work at that time (by chance the same number of patients [n=57] was in both groups). Because the distributions of BPS scores in both groups were normal (P>.20), an independent t test was used to determine whether there was a difference in BPS sum scores between the 2 groups. The level for rejecting the null hypothesis of no difference was P<.05.
Responsiveness.
Responsiveness refers to the power of a scale to detect meaningful change when it occurs.44 A construct for examining responsiveness was suggested by Stratford et al: "those patients judged by an external standard as having achieved an important change will demonstrate greater improvement [by the measure] than those judged by the standard as not achieving an important change."45(p1112) There is no gold standard for important or meaningful change. Because the patients in our study were all on long-term sick leave at the beginning of the study, having returned fully to work at the 1-year follow-up was considered an important change. In order to indicate that the BPS is responsive to important change, those patients should have improved more on the BPS than the patients who had not returned to work, defined as those who still received workers' compensation.
Effect size and receiver operating characteristic (ROC) curve statistics can be used to indicate responsiveness of assessment tools, and both were used to examine responsiveness of the BPS. Effect size is defined as the mean change found in a variable divided by the standard deviation of that variable.46 The mean change of BPS sum scores from the beginning of the study to the 1-year follow-up was divided by the standard deviation of the change and examined in improved and unimproved patients as defined by the dichotomous return-to-work variable. Specific benchmarks of effect size were used for interpreting the magnitude of change. Cohen47 suggested an effect size of .20 to be small, .50 moderate, and .80 to be large. Whether such benchmarks of change are appropriate for the magnitude of change in the present study may be debated. However, a higher effect size was expected in patients who had returned to work after 1 year than in those who had not.
Responsiveness of the BPS also was examined by ROC curve statistics. Deyo and Centor48 suggested that measures may be viewed as diagnostic tests for discriminating between patients who improved and those who did not improve and, accordingly, can be described in terms of sensitivity and specificity in detecting improvement (yes/no) as established by another criteria of important change. Sensitivity against 1 specificity was plotted for each of several possible cutoff points in change of BPS sum scores from the beginning of the study to the 1-year follow-up. Sensitivity, in our study, was defined as the number of patients correctly identified by the test as having returned to work divided by all patients who had returned to work after 1 year. We defined specificity as the number of patients correctly identified by the test as not having returned to work divided by all patients who had not returned to work. Equal importance of high sensitivity and specificity was assumed.49 The area under the ROC curve is interpreted as the probability of correctly identifying the improved patients from randomly selected pairs of improved and unimproved patients.48 The area ranges from 0.5 (no accuracy in discriminating improved from unimproved) to 1.0 (perfect accuracy). The areas under the ROC curves were used to compare responsiveness of the BPS with those of the separate 5 performance tests.
| Results |
|---|
|
|
|---|
Scale Construction
Test scores obtained when the study began for the 157 patients with back pain are shown in Table 3. The test scores for one patient with back problems are illustrated in Figure 2. Bivariate correlations (rs) ranged from .27 to .50 among the 5 tests and from .63 to .73 between the separate tests and the BPS (P<.01) (Tab. 4). Internal consistency of the BPS was .73 (coefficient alpha). The BPS sum scores at the beginning of the study were normally distributed, as illustrated in Figure 1. One patient had a score of 0 (minimum), and one patient had a score of 15 (maximum).
|
|
Patients with a history of back pain who still received workers' compensation 1 year after rehabilitation demonstrated higher BPS sum scores (
=6.3, SD=3.7, range=014) than patients who had returned fully to work at that time (
=3.7, SD=3.2, range=014) (P<.001).
Responsiveness
The effect size of change between pretest and posttest measurements assessed by the BPS was 1.33 in patients who had changed and 0.31 in patients who had not changed, as defined by the dichotomous return-to-work variable (Tab. 5). High sensitivity (67%) and high specificity (70%) were obtained with a cutoff point of improvement of 2.5 on the BPS. Approximately two thirds of the patients were correctly classified as having returned to work if they improved at least this much on the BPS, and approximately two thirds of the patients were correctly classified as not having returned to work if they had less improvement. The area under the ROC curve (Fig. 3) was larger (0.77) in the BPS than in each of the separate performance tests (Tab. 6). When transforming the Roll-up Test from an 8-point ordinal scale to a 4-point ordinal scale, the area under the ROC curve did not change. However, the areas under the ROC curve became smaller when the Fingertip-to-Floor Test and the Lift Test were transformed from ratio scales to 4-point ordinal scales (Tab. 6), implying less responsiveness to change.
|
|
|
| Discussion |
|---|
|
|
|---|
"Back pain" is an ambiguous term because this condition includes different syndromes.51 Physical performance may reflect not only the nature and severity of the underlying pathology, but also how the patient interprets and reacts to pain.39 The tests were not more than moderately correlated, and we believe they could be used to assess somewhat different aspects of performance. For an individual patient, some activities may be very difficult to perform, whereas other activities are not difficult to perform (Fig. 2). The lowest and highest scores were frequently achieved by our subjects on the separate performance tests, indicating that ceiling and floor effects were present, which means that a test cannot register greater gains or greater decline respectively.52 The problem of ceiling and floor effects was almost eliminated when the BPS was applied on groups of patients with long-lasting back pain.
Usually, when a new self-report instrument is developed to assess a dimension of health, the first task is to generate a pool of all potentially relevant items. Based on clinical judgment and statistical procedures, items are then selected from this pool for inclusion in the final instrument.53 In our study, the choice of test items was post hoc, restricted to those initially included in the test armament of the Bergen Study: Back to Work33 and further restricted to tests apparently designed to measure trunk mobility in functional tasks. It is an issue for further research to consider other tests, also involving side bending and twisting, for supplementing or replacing tests of the BPS.
Loss of physical abilities due to back pain may be attributable to physiological changes in motor patterns and reduced overall performance, rather than to isolated spinal impairments. The 5 activities tested in the BPS are performed from different starting positions, such as standing (Pick-up Test, Fingertip-to-Floor Test, Lift Test), sitting (Sock Test), and supine (Roll-up Test), causing gravity to act differently on the body, and some activities are performed with straight knees, whereas other activities are performed with flexed knees. Each test seems to us to measure the behavioural outcome that results from the interaction of the individual, the task, and the environment, which according to Gentile54 is an analysis at the activity level. Because behavioral outcomes related to daily tasks are reflected in the BPS, we believe the BPS has the potential of being a performance measure with clinical usefulness. The BPS was intended to be a practical measure of activity limits, and we believe it is quick to perform and simple to assess.
Discriminative Ability
Discriminative ability of the BPS was demonstrated. The BPS sum scores were higher (worse) in patients with back pain than in patients with neck or shoulder pain and those with generalized pain. Activity limitation was particularly demonstrated by the high scores of the patients with back pain, indicating that the BPS is a condition-specific measure for people with back pain. All of the tests involve coordinated actions of the trunk in relation to the lower extremities. A correlation of r=.60 has been demonstrated between the fingertip-to-floor distance and flexibility of the hamstring muscles.55 Therefore, it remains to be seen whether the BPS may also be useful to reflect activity limitation in patients with lower extremity dysfunction.
Performance scores also were higher (worse) in patients who had not returned fully to work 1 year after rehabilitation than in those who had returned fully to work. Discriminative ability is commonly examined by contrasting test scores of patients and those of subjects without the condition being studied. In our study, all participants had a history of long-lasting back pain. An even larger difference in BPS scores might have been obtained if test scores of patients with back problems had been contrasted to those of people with no recent sick leave because of back problems.
Responsiveness
Responsiveness of the BPS to change was demonstrated by 2 commonly used methods.45 Effect size statistics relate the magnitude of change to the variability in scores, but controversy exists as to whether variability of baseline scores or change in scores should be used in the equation.47,56,57 Because baseline and follow-up measurements are not independent, we used standard deviation of the change, in accordance with the suggestion of Cohen.47 Effect size was very high in patients who had improved, as defined by the external dichotomous return-to-work variable, but it was low in those who had not improved. This finding indicated to us that the BPS is responsive to change at the participation level of functioning. Test-retest reliability of BPS scores has not been examined, but the effect sizes in the 2 groups indicate to us that this aspect of reliability is sufficient.
Whether or not the return-to-work variable is an optimal external indicator to examine responsiveness of the BPS is arguable. Physical performance is only one of many factors (socioeconomic, psychological, demographic, job characteristics) that influence whether or not patients return to work after long-term sick leave.5860 A patient's global impression of change49 may be shown to be an alternative external indicator of change in the future. Research, we believe, also should determine a numerical value of change by the BPS that is meaningful to monitor perceived change in an individual patient.
Responsiveness of the BPS to change also was demonstrated by ROC curve analysis. Change in BPS sum scores from the beginning of the study to the 1-year follow-up discriminated between patients who improved and those who did not improve, as defined by the dichotomous return-to-work variable. A correct classification of approximately two thirds of the patients at the particular BPS cutoff point of change (2.5) may seem satisfactory. The area under the ROC curve was greater in the BPS scores than in the scores of the separate tests, with the Sock Test demonstrating the greatest area among them (Tab. 6). Although the 4-point ordinal scale of the Sock Test is a gross measure, it apparently signifies clinically important steps in a patient's rehabilitation. Responsiveness became less when the Fingertip-to-Floor Test and the Lift Test were transformed from ratio to 4-point ordinal scales. These tests, we contend, should be used with their ordinal scales to better measure change in the separate activities, but we believe they should be transformed to 4-point ordinal scales when used as part of the BPS. The 0 to 15 ordinal sum score of the BPS provided the most responsive measure, more responsive than the single tests, irrespective of ordinal or ratio scales.
| Conclusion |
|---|
|
|
|---|
| Footnotes |
|---|
Data were derived from a rehabilitation study, The Bergen Study: Back to Work, which was funded by the Department of Health and Social Welfare and administered by the Municipality of Bergen. The physical test data were collected at the College of Physiotherapy in Bergen, Norway.
The Bergen Study: Back to Work was performed according to the Helsinki Declaration and approved by the Regional Ethics Committee, Health Region III, Norway, and the Norwegian Data Inspectorate.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. J. Smeets, H. J. Hijdra, A. D. Kester, M. W. Hitters, and J A. Knottnerus The usability of six physical performance tasks in a rehabilitation population with chronic low back pain Clinical Rehabilitation, November 1, 2006; 20(11): 989 - 997. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |