|
|
||||||||
Research Reports |
JK Freburger, PT, PhD, is Research Associate and Fellow, Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, NC
TS Carey, MD, MPH, is Director, Cecil G. Sheps Center for Health Services Research, and Professor, Departments of Medicine and Social Medicine, School of Medicine, University of North Carolina, Chapel Hill
GM Holmes, PhD, is Research Associate and Fellow, Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill
(janet_freburger{at}unc.edu) Address all correspondence to Dr Freburger at 725 Airport Rd, CB#7590, Chapel Hill, NC 27599-7590 (USA)
Submitted March 31, 2005;
Accepted September 30, 2005
| Abstract |
|---|
Subjects. The participants were people who had spine problems lasting 3 months or longer and who were seen for an initial visit and a follow-up visit (N=4,479) at 1 of 17 US spine centers.
Methods. A propensity score approach was used to create a matched sample of participants who received physical therapy (intervention group) and participants who did not receive physical therapy (control group). The 2 groups were similar with regard to more than 50 baseline characteristics. Outcomes were assessed with the Oswestry Disability Index (ODI) and the 36-Item Short-Form Health Survey (SF-36).
Results. Both the intervention and control groups improved between the initial and the follow-up visits on ODI scores and on SF-36 physical function, role physical, and bodily pain scores. Although the amount of improvement in the outcome measures was significantly greater for the intervention group than for the control group, the differences were small (3–5 points). When the subgroup of participants who had the greatest propensity for receiving physical therapy was examined, differences in the amount of improvement between the intervention and control groups were larger (5–13 points). Discussion and Conclusion. Physical therapy was effective in the management of chronic spine disorders in participants with the greatest propensity for receiving physical therapy. When the entire sample was considered, differences in the amount of improvement between the intervention and control groups were not clinically relevant. [Freburger JK, Carey TS, Holmes GM. Effectiveness of physical therapy for the management of chronic spine disorders: a propensity score approach.
Key Words: Back pain Effectiveness Neck pain Observational studies Physical therapy
| Introduction |
|---|
|
|
|---|
Spine disorders also are common reasons for ambulatory care visits to physical therapists. In 1988, 26% of all office-based physical therapy visits in the United States were for low back injuries, and 7% were for head or neck injuries.8 Data from the 1996 Medical Expenditure Panel Survey indicate that 25% of ambulatory care visits to physical therapists were for spine disorders.9 Among people who visit a physical therapist, a majority have chronic problems (eg, pain or dysfunction for more than 3 months).10–12
In the past decade, the results of randomized clinical trials (RCTs) and meta-analyses have provided various degrees of support for the efficacy of specific, nonsurgical physical interventions (that may be delivered by physical therapists) for the management of spine disorders.13–32 Evidence on the efficacy of care specifically provided by physical therapists is more limited. Korthals-de Bos et al33 reported that manual therapy delivered by physical therapists was more cost-effective than traditional physical therapy (ie, exercise, stretching, and functional activities) or general practitioner care in the management of neck pain. Skargren et al34 reported that the effectiveness and costs of chiropractic care and physical therapy were similar for the management of low back pain but that the direct costs of physical therapy in a few subgroups were lower. Mannion et al35 reported that outcomes were similar for subjects who had low back pain and who participated in physical therapy, a muscle strengthening program, or a low-impact aerobic exercise program. Frost et al36 reported that routine physical therapy was no more effective than one session of assessment and advice from a physical therapist for the management of low back pain.
Although there is evidence to suggest that some interventions (eg, exercise, spinal manipulation, massage) that may be delivered by physical therapists are efficacious in the management of spine conditions, evidence on the efficacy of care specifically provided by physical therapists is less conclusive. The RCTs that have been conducted also are limited because of small sample sizes, heterogeneous samples, or lack of standardized treatments. Evidence from observational studies assessing the effectiveness of routine care delivered to "typical" subjects with spine problems also can be helpful if adequate steps are taken to control for selection bias (eg, controlling for baseline differences between subjects who receive intervention and subjects who do not receive intervention) during data analysis. Observational studies are typically less costly, have larger samples, and often generate results that are more generalizable than those of RCTs. When combined with the results of RCTs, the results of observational studies can provide health care professionals and policy makers with more complete information to make management or policy decisions, especially when data from RCTs are limited.
The objective of this study was to use a large current database, the National Spine Network (NSN) database,37 to assess the effectiveness of physical therapy in the management of chronic spine conditions. In this study, we specifically assessed the effectiveness of physical therapy as a whole and did not assess the effectiveness of particular interventions delivered by physical therapists. Because we used observational data, we used a propensity score approach to account for selection bias.
| Method |
|---|
|
|
|---|
Specific questionnaires in the patient survey are the 36-Item Short-Form Health Survey (SF-36),38 the Oswestry Disability Index (ODI),39 and part of the Musculoskeletal Outcomes Data Evaluation and Management System (MODEMS) questionnaire.40 Questions on the SF-36 are based on symptoms and function in the preceding 4 weeks. The instructions on the ODI were modified as follows, "In the past week, please tell us how pain has affected your ability to perform the following activities ... ," so that questions are not specific to patients with low back pain.
Other patient-reported data include demographic information, symptoms, comorbidities, medications, work status, use of care, expectations about care, and satisfaction with care. Physician-reported data include patient signs and symptoms, surgical history, diagnosis, diagnostic tests ordered, intervention plan, and assessment of patient progress. Patients, physicians, and centers are identified in the database by identification numbers. No data that could be used to specifically identify a patient, physician, or center are provided.
On a weekly basis, participating clinics mail completed paper survey questionnaires to the central coordinating center. Survey questionnaires are returned if key data are missing or if the data are invalid. Data from the surveys then are keyed by data entry technicians. A second, independent data entry technician keys in the data again, and any discrepancies are resolved. Data then are loaded into the central NSN data repository.
Sample
Our sample consisted of subjects who were seen for an initial visit and a follow-up visit at 1 of 17 US spine centers from 1998 to 2002, who had NSN survey data for both visits, and who had chronic spine problems (ie, problems for more than 3 months) (N=8,206). Subjects with a follow-up visit less than 3 weeks or more than 1 year after the initial visit were excluded (n=1,192). We chose this time interval to maximize sample size. We reasoned that subjects would be matched on this variable, that a minimum of 3 weeks would be necessary to effect change in subjects with a short history of spine problems (ie, about 3 months), and that a maximum of 1 year might be necessary to effect change in subjects with a long history of spine problems (ie, more than 3 years). The distribution of the data also guided our decision. The 10th and 90th percentiles for time between initial and follow-up visits were approximately 3 weeks and 1 year, respectively. Subjects who were recommended for or scheduled for surgery by the physician (as indicated on the physician survey) or who were diagnosed with spinal cord compression at the initial visit also were excluded (n=2,535). The final sample consisted of 4,479 subjects. The mean number of subjects per center was 264 (SD=546).
Whether the subject received physical therapy was determined by examining both physician-reported and subject-reported data. Subjects were coded as receiving physical therapy if the physician indicated at the initial visit that physical therapy was recommended or scheduled and if the subjects indicated at the follow-up visit that they had seen a physical therapist. Forty-four percent of the subjects (n=1,963) received physical therapy between their initial and follow-up visits.
Controlling for Selection Bias: Propensity Score Approach
With observational data, intervention assignment is not random, and large differences in observed covariates are likely to exist between subjects who received intervention (intervention group) and subjects who did not receive intervention (control group). For example, in our sample, the baseline ODI scores for subjects who received physical therapy were significantly lower, a result that indicates less impairment, than those for subjects who did not receive physical therapy. Differences in observed covariates between the intervention and control groups can lead to biased estimates of treatment effects because the groups are not comparable at baseline. The propensity score approach, introduced by Rosenbaum and Rubin,41 is an approach that can be used to reduce this bias.
The propensity score is defined as the conditional probability of being treated given a vector of observed covariates and represents a scalar summary of the covariate information.41,42 Information on the probability that a subject would have been treated (ie, the propensity score) is used to adjust the estimate of the treatment effect thereby creating a quasi-randomized sample.42 That is, 2 subjects, 1 in the intervention group and 1 in the control group, with the same propensity scores can be considered to have an equal probability of being in the intervention group or in the control group. Another interpretation is that subjects in the intervention and control groups with equal or nearly equal propensity scores will tend to have the same distribution of background covariates.43
The propensity score is estimated with either a logit model or a probit model to predict treatment status from the observed covariates.41,43,44 Once the propensity score is estimated, different techniques can be used to create a sample that has comparable intervention and control groups. For this study, we used a matching approach to create a balanced sample (ie, equal numbers of subjects in the intervention and control groups). Our matching approach is described in more detail below.
Estimation of the Propensity Score
We used a logistic regression model in which the dependent variable was whether the participant received physical therapy. We included in the model more than 50 covariates that represented subject characteristics at the initial visit (eg, demographics, clinical status, health care payment, work status) and the physician intervention plan after the initial evaluation (eg, tests ordered, interventions prescribed). We also included a variable to represent the time between the initial and follow-up visits. Definitions and descriptive statistics for these covariates are presented in Table 1 by physical therapy intervention. The results of statistical tests of differences in means or frequencies also are reported and illustrate the differences in the observed covariates between subjects who did and subjects who did not receive physical therapy. For example, with regard to sex, a greater proportion of women received physical therapy.
|
Clinical characteristics.
Clinical characteristics included both physician-reported and subject-reported characteristics. Physician-reported characteristics included primary diagnosis, level of involvement, and whether the subject had nerve root compression. Diagnoses were categorized as follows: herniated disk, spinal stenosis, spondylosis, pain syndrome, sprain or strain, deformity, and "other." Three dichotomous variables represented level of involvement: cervical, thoracic, and lumbosacral. These 3 variables were not mutually exclusive. A dichotomous variable indicated whether the subject had nerve root compression.
Subject-reported characteristics included duration of the problem, number of comorbidities, and medication use. Duration of the problem was categorized as 3 months to 1 year, 1 to 3 years, and more than 3 years. The mean and median numbers of comorbidities were 1.7 (SD=1.7) and 1, respectively. Therefore, we created 3 dichotomous variables to indicate whether the subject had no comorbidities, 1 comorbidity, or 2 or more comorbidities. Dichotomous variables also indicated whether the subject smoked, had a history of surgery, or had a history of depression. The last variable was based on responses to 2 questions that are sensitive and specific screeners for depression45: "In the past year, have you had 2 weeks or more during which you felt sad, blue, or depressed or when you lost all interest in things that you usually cared about or enjoyed?" and "Have you felt depressed or sad much of the time in the past year?" If subjects responded "yes" to either question, we classified them as having a history of depression. Four variables represented the subjects current medication use; a continuous variable indicated the number of medications (prescription and nonprescription) that the subject was taking for the spine problem, a dichotomous variable indicated whether the subject was taking nonprescription medication, a dichotomous variable indicated whether the subject was taking prescription medication, and a dichotomous variable indicated whether the subject had taken pain medication daily over the preceding week.
We also included several subject-reported variables related to the current symptoms, functional status, and health status. Current symptoms were represented by a variable indicating the number of body parts that had bothered the subject or limited function in the preceding week. We also included a variable to indicate whether the subject had spine pain with radiating symptoms in the arm or leg. This variable was created on the basis of responses to the MODEMS questionnaire.40 A dichotomous variable indicated whether the subject reported that symptoms were getting worse. Functional status and health status were represented by ODI scores and SF-36 scores on the following subscales: physical function, role physical, bodily pain, and general health.
Health care payment and work-related characteristics.
The NSN survey asks 3 questions about social security disability, disability insurance, and workers compensation. We created 2 dichotomous variables using the responses to these 3 questions. One variable indicated whether the subject was receiving any type of disability insurance. The second variable indicated whether the subject was receiving workers compensation. We also included a variable to indicate whether the subject was considering legal action or had taken any legal action that was pending. Finally, we created 2 work-related variables to indicate whether the subject was on a leave of absence or stopped work because of the spine-related problem and to indicate whether the subject was disabled or retired because of the spine-related problem.
Intervention plan characteristics.
The physician portion of the NSN survey instructs physicians to indicate whether they ordered any of 14 diagnostic tests. Because a majority of the physicians who ordered tests ordered just one test, we created a dichotomous variable to indicate whether the physician ordered one or more tests. We considered this measure a proxy for illness severity and diagnostic certainty. Physicians also are instructed to indicate whether they prescribed medication, an injection, or any of the following 9 interventions: physical therapy, functional restoration, manipulation, transcutaneous electrical stimulation (TENS), brace or corset, vocational counseling, coping skills training, pain clinic, or "other." We created 7 dichotomous variables to indicate whether the physician prescribed medication, an injection, functional restoration, manipulation, brace or corset, coping skills training, or pain clinic. We excluded TENS, vocational counseling, and "other" because of the infrequent prescription of these interventions. Finally, we included a dichotomous variable to indicate whether the physician made a psychiatry or psychology consult referral.
Creation of the Matched Sample
Once the propensity scores were estimated, a greedy matching algorithm46 was used to create a balanced matched sample. Subjects who received physical therapy were randomly ordered, and the first intervention group subject was selected. All control group subjects with a propensity score within a given amount (or "caliper") of the intervention group subjects propensity score were retained. On the basis of the recommendations of Rosenbaum and Rubin,43 we chose a caliper equal to .40 of the pooled standard deviations of the estimated propensity scores. The control group subject with a propensity score closest to the intervention group subjects propensity score was identified, and then both the intervention group subject and the control group subject were removed from the pool. This process was repeated until all possible matches between the intervention group subjects and the control group subjects were made. We matched 69% of the intervention group subjects (n=1,362) for a final sample of 2,724 subjects. The mean propensity score for subjects who received physical therapy was .465 (SD=.186). The mean propensity score for subjects who did not receive physical therapy was .459 (SD=.182). Thirty-one percent of the subjects who received physical therapy (n=601) did not match a control group subject because there were no control group subjects who had propensity scores within .40 of the pooled standard deviations of the propensity scores of the intervention group subjects.
Next, we assessed the "balance" of the covariates between the intervention and control groups by testing the differences in means or frequencies between subjects who did and subjects who did not receive physical therapy. We also assessed the balance of the covariates by calculating the standardized percent difference, which is the difference in the covariate means of the intervention and control groups expressed as a percentage of the average standard deviation.42 Differences larger than 10% would suggest that a covariate might not be appropriately balanced.47 The Figure summarizes our methods for identifying and creating the matched sample.
|
Data Analysis
All analyses were conducted with Stata (version 8.0).* We used t tests used to determine whether there were statistically significant differences between changes in the outcomes for the intervention and control groups. We also examined whether the effectiveness of physical therapy varied as a function of the propensity for receiving physical therapy by conducting subgroup analyses (using t tests) based on the quintile distribution of the propensity scores. Because of the heterogeneity of the sample, we hypothesized that the effectiveness of physical therapy might be greater for people who had a greater propensity to receive physical therapy (ie, had the group characteristics that made them more likely candidates for physical therapy). On the basis of the distribution of the propensity scores and to maintain adequate statistical power, we chose to break the sample into 5 subgroups.
We chose not to adjust for multiple t tests (eg, Bonferroni adjustment) for reasons outlined by Perneger.48 First, such adjustments are concerned with the general null hypothesis that all null hypotheses are true simultaneously. In our study, the general null hypothesis would be that the intervention and control groups are the same on all outcome variables. This hypothesis was not of interest to us. Second, a multiple t test adjustment implies that a given comparison will be interpreted differently depending on the number of tests performed. For example, testing one outcome measure might indicate that an intervention is effective, but if 5 outcome measures are tested, then the first outcome might no longer be significant. This scenario does not make clinical sense to us. Third, with adjustment for multiple t tests, the chance of a type I error is decreased at the expense of a type II error. Finally, the number of comparisons that we were conducting was relatively small.
| Results |
|---|
|
|
|---|
|
|
|
| Discussion |
|---|
|
|
|---|
Because subjects experienced an improvement in function regardless of physical therapy, examining differences in the amounts of improvement for subjects who did and subjects who did not receive physical therapy might be a more clinically relevant way to examine the data. The difference in improvement in ODI scores for the intervention and control groups was 2.8 (Tab. 3). Differences in improvement in SF-36 scores for the intervention and control groups ranged from 3.1 to 5.4 (Tab. 3). All of these values are below the clinically meaningful differences cited above.
One explanation for the small improvements in function may be related to the breadth of the sample with regard to clinical presentation. When the analyses were limited to subjects with a greater propensity for receiving physical therapy, then the differences in the amounts of improvement between the intervention and control groups were clinically significant. For example, for subjects with propensity scores in the fifth quintile, the difference in improvement in ODI scores between the intervention and control groups was 4.8, and the difference in improvement in SF-36 role physical scores was 12.5 (Tab. 4). In a post hoc analysis comparing subjects with a low propensity for receiving physical therapy (first quintile) with subjects with a high propensity for receiving physical therapy (fifth quintile), we found that subjects who were more likely to receive physical therapy were less impaired in baseline physical function and were more likely to have a sprain or strain, spondylosis, or pain diagnosis. A second explanation for the small improvements may be related to the chronicity of the subjects spine problems. Approximately one half of the subjects reported spine problems for 3 years or more.
Although our results suggest that subjects who have chronic spine problems and who receive physical therapy show small improvements in function, one may question whether these gains are worth the added cost of receiving physical therapy. We did not specifically address this issue in our analysis but we did match subjects on the tests and interventions prescribed by the evaluating physician. If it is assumed that the subjects in the intervention and control groups did not receive other tests or interventions, then the differences in the costs of care for the intervention and control groups would be attributable to the cost of physical therapy. We did not have information on the number of physical therapist visits the subjects made or whether physical therapy was discontinued before the follow-up visit. We also did not have information on the specific interventions received by the subjects. Our analysis addressed whether subjects who received any amount or type of physical therapy had outcomes different from those of subjects who did not receive physical therapy. Our results may have been different had we limited our analysis to the amount or type of physical therapy received.
We used a propensity score approach to reduce selection bias by creating intervention and control groups that were matched on observed covariates. The null finding with respect to the SF-36 general health score and the significant findings with regard to the functional measures suggest that the propensity score matching was effective. Because physical therapy is less likely to affect overall health and because the estimated difference in improvement attributable to physical therapy was statistically insignificant, the matching likely removed unobserved confounders. Finding an improvement in general health as a result of physical therapy might have been interpreted as evidence of a sample selection problem. Subjects more likely to show an improvement in general health received physical therapy at a higher rate; thus, the observed improvement in the other outcomes might have been attributable to a selection effect rather than a physical therapy effect.
Study Limitations
This study has several limitations. First, the generalizability of our results is limited to people who have back or neck pain and who visit spine care centers. These people may differ from people who have back or neck pain and who do not visit spine care centers. Unfortunately, data on people who do not visit spine care centers are not available to make comparisons. Participation in the NSN database also is voluntary at the levels of the individual and the spine care center. Therefore, the data are not nationally representative and may be subject to non-response bias. Spine care centers and therefore subjects in the West are underrepresented. Most of the data also are from spine care centers affiliated with academic institutions. Private-practice spine care centers may be underrepresented.
A second limitation is that we matched only 69% of the subjects who received physical therapy. That is, 31% of the subjects who received physical therapy could not be matched on baseline characteristics to subjects who did not receive physical therapy. To obtain a better idea of the differences between the 2 groups, we assessed whether demographic and health-related characteristics differed. The unmatched subjects who received physical therapy had better function and health, were slightly older, and were slightly less educated than the matched subjects who received physical therapy. The proportion of men in the unmatched group also was slightly higher than that in the matched group. These findings also limit the generalizability of our results and illustrate one of the limitations of matching for propensity scores (ie, often not every subject in the intervention group can be matched to a subject in the control group).
We chose a propensity score approach to increase the internal validity of our analysis at the expense of external validity. In a post hoc analysis with traditional regression methods for the entire sample (N=4,479), the effects of physical therapy were generally smaller for each outcome variable, by 1 or 2 points, a finding that represents a 20% to 30% decrease. The results of these 2 approaches may have diverged more if the effects of physical therapy had been greater.
The fact that the regression and propensity score methods produced similar results in our study agrees with a recent systematic review by Shah et al.52 These authors examined the results of 43 observational studies that used both traditional regression and propensity scores to assess treatment effects. Only 10% of the studies showed significant differences in effects for the 2 approaches. The authors pointed out, however, that many of the studies reviewed did not implement the propensity score approach appropriately. We believe that we implemented the propensity score approach appropriately in this study and consider it superior to traditional regression methods because of the theory behind the method and because it allowed us to conduct subgroup analyses with propensity scores.
A third limitation is that the reliability and validity of some of the self-report data in the NSN database have not been established. Some of the information supplied by the patients and physicians may be inaccurate. Although there are data to support the reliability or validity, or both, of the SF-36 and the ODI,53–58 studies on the reliability and validity of data for the ODI have focused on subjects with low back pain. The ODI included in the NSN survey was modified to be used for all subjects with spine problems. The usefulness of this measure for assessing thoracic pain and neck pain is not known. Some of the activities assessed with the ODI are likely to be most affected by low back pain (eg, sitting). We believe, however, that all of the activities assessed with the ODI also could be affected by neck or thoracic pain. As might be expected, baseline scores on the ODI were higher for participants with pain exclusively in the low back area (X=43, SD=19) than for participants with pain exclusively in the neck or thoracic area. The means of the ODI scores for participants with neck pain and participants with thoracic pain were 33 (SD=20) and 40 (SD=18), respectively. Despite the differences in baseline values, the amounts of improvement for the groups at follow-up were the same (4 or 5 points).
Missing data also can be problematic in databases such as the NSN database. Data for this study were extracted from the master NSN database for 1998 to 2002, which contained more than 60,000 records on subjects with spine problems. Generally speaking, less than 10% of the data were missing for each of the variables in the master NSN database. In most cases, missing data were replaced in the master database by simple imputation techniques (eg, replacing with median, mean, or most frequent response). Missing demographic variables, however, were not imputed. In this study, we maintained the records on these subjects by creating missing dummy variables.
A final limitation is that the propensity score approach controls only for observed differences between the intervention and control groups. Although we matched the intervention and control group subjects on more than 50 observed covariates, the intervention and control group subjects might have differed in characteristics that were not observed.
Limitations notwithstanding, the results of this study suggest that physical therapy is effective in the management of some chronic spine disorders. In observational studies, such as this study, confidence in causal conclusions is increased or decreased by examining how consistent the findings are with evidence from other studies.59 Although some studies support the efficacy of specific interventions that may be delivered by a physical therapist, studies assessing the efficacy of physical therapy as a whole in the management of chronic spine problems are more limited. Various investigators have reported minimal, if any, differences between physical therapy and chiropractic care,34 muscle strengthening,35 aerobic exercise,35 or general practitioner care33 in the management of chronic spine disorders.
| Conclusion |
|---|
|
|
|---|
| Footnotes |
|---|
This research was funded by a research grant from the Foundation for Physical Therapy.
* Stata Corp, 4905 Lakeway Dr, College Station, TX 77845. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L. G Macedo, C. G Maher, J. Latimer, and J. H McAuley Motor Control Exercise for Persistent, Nonspecific Low Back Pain: A Systematic Review Physical Therapy, January 1, 2009; 89(1): 9 - 25. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Resnik, D. Liu, D. L Hart, and V. Mor Benchmarking Physical Therapy Clinic Performance: Statistical Methods to Enhance Internal Validity When Using Observational Data Physical Therapy, September 1, 2008; 88(9): 1078 - 1087. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Deutscher, D. L Hart, R. Dickstein, S. D Horn, and M. Gutvirtz Implementing an Integrated Electronic Outcomes and Electronic Health Record Process to Create a Foundation for Clinical Practice Improvement Physical Therapy, February 1, 2008; 88(2): 270 - 285. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |