PTJ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


PHYS THER
Vol. 86, No. 6, June 2006, pp. 817-824

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Foley, N. C
Right arrow Articles by Speechley, M. R
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Foley, N. C
Right arrow Articles by Speechley, M. R

Research Reports

Estimates of Quality and Reliability With the Physiotherapy Evidence-Based Database Scale to Assess the Methodology of Randomized Controlled Trials of Pharmacological and Nonpharmacological Interventions

Norine C Foley, Sanjit K Bhogal, Robert W Teasell, Yves Bureau and Mark R Speechley

NC Foley, MSc (Candidate), is Research Associate, Department of Physical Medicine and Rehabilitation, Parkwood Hospital, St Joseph’s Health Care London, London, Ontario, Canada. Address all correspondence to Ms Foley at 801 Commissioner’s Rd East, London, Ontario, Canada N6C 5J1
SK Bhogal, MSc, is PhD Candidate, Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada
RW Teasell, MD, FRCPC, is Professor and Chair/Chief, Department of Physical Medicine and Rehabilitation, Parkwood Hospital, St Joseph’s Health Care London and the University of Western Ontario, London, Ontario, Canada
Y Bureau, PhD, is Statistical Consultant, Imaging Program, Lawson Health Research Institute, London, Ontario, Canada
MR Speechley, PhD, is Associate Professor, Department of Epidemiology and Biostatistics, Faculty of Medicine and Dentistry, Schulich School of Medicine and Dentistry, University of Western Ontario

(norine.foley{at}sjhc.london.on.ca)


Submitted July 28, 2005; Accepted January 16, 2006


    Abstract
 
Background and Purpose. Systematic reviews and meta-analyses often include an evaluation of the methodological quality of the individual studies that have been included, and are usually conducted by at least 2 individuals. The objective of this study was to assess the methodological quality and reliability of a series of randomized controlled trials (RCTs) of both pharmacological and nonpharmacological interventions by use of the 10-item Physiotherapy Evidence-Based Database (PEDro) Scale. Methods. Two abstractors independently reviewed 81 RCTs assessing a variety of interventions. The Cohen kappa statistic and the intraclass correlation coefficient (ICC) were used to assess agreement between abstractors. Results. The average total PEDro scores were 5.94 (SD=1.43) for all studies combined, 6.88 (SD=1.2) for pharmacological studies, and 5.29 (SD=1.26) for nonpharmacological studies. The median score for pharmacological studies was significantly higher than that for nonpharmacological studies (7 versus 5). Pair-wise kappa scores ranged from a low of .452 for concealed allocation among drug trials to perfect agreement (1.00) for randomization and reporting of results from between-group comparisons. The ICCs associated with the cumulative PEDro score were .91 (95% confidence interval [CI]=.83–.94) for all studies, .89 (95% CI=.78–.95) for pharmacological studies, and .91 (95% CI=.84–.952) for nonpharmacological studies. Discussion and Conclusion. The methodological quality for pharmacological interventions was significantly higher than that for nonpharmacological interventions. There was good agreement between raters at an individual item level and in total PEDro scores. A lack of reporting clarity, poor organization of the report, or the failure to include salient details contributed to less-than-perfect agreement between raters. [Foley NC, Bhogal SK, Teasell RW, et al. Estimates of quality and reliability with the Physiotherapy Evidence-Based Database Scale to assess the methodology of randomized controlled trials of pharmacological and nonpharmacological interventions.

Key Words: Meta-analysis • Quality assessment • Reliability • Systematic review


    Introduction
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion
 Conclusion
 Appendix
 References
 
An assessment of the methodological quality of individual randomized controlled trials (RCTs) included in meta-analyses and systematic reviews is commonly undertaken; this process is intended to identify potential sources of bias that may compromise both the internal validity and the external validity of a study.1 Despite the continuing debate over the relative merits of this endeavor and in the absence of a gold standard, there has been a proliferation of various scales and checklists intended to evaluate key components of trial quality. Most scales typically award a series of points when study criteria are met. Theoretically, higher overall scores indicate studies with better methodological quality, which in turn yield estimates of intervention effects that are closer to the true results. The score that an individual RCT receives as a result of this process may determine its inclusion in the review or its weighting in the pooled results. Often, the final quality score that an RCT receives is based on consensus ratings from 2 or more study abstractors. For this reason, regardless of the quality assessment tool chosen, good agreement between raters must be established.

Most quality assessment tools provide standardized administration guidelines to ensure uniform application; however, the scores awarded by abstractors depend markedly on the level of methodological detail described in each study. When reporting lacks clarity, individual interpretation may differ between abstractors, affecting the consistency of agreement and thereby reducing reliability. Disagreements typically are resolved by third-party review or through arbitration between reviewers. Surprisingly, although many scales are in use, few estimates of reliability have been published.

The Physiotherapy Evidence-Based Database (PEDro) Scale, developed by the Centre for Evidence-Based Physiotherapy, is an example of one such quality assessment scale.2 The scale is based largely on the Delphi List3 and was developed to assess the methodological quality of RCTs specifically pertaining to physical therapy interventions that were included in the database. The interrater reliability of the PEDro Scale was assessed previously in only a single trial, and no studies assessing the reliability of the Delphi List have been published. By use of the kappa statistic for pair-wise comparisons, reliability estimates determined with the PEDro tool for 2 raters assessing 120 RCTs were found to range from .50 to .79 after consensus was achieved (.36 to .80 before consensus). The intraclass correlation coefficient (ICC) for total scores was .56 (95% confidence interval [CI]= .47–.65) for ratings by individual raters. The percentage of agreement ranged from 70% to 98%.4

In an effort to determine which tools had been used previously to assess the methodological quality of published reviews, we surveyed 10 randomly selected reviews that evaluated physical therapy interventions from the Cochrane Database of Systematic Reviews. We found a wide range of approaches used to assess the methodological quality of individual RCTs.514 Most reviews used a qualitative, checklist approach, whereby individual methodological components were noted to be present or absent, but a total score was not determined. The number of quality items assessed for adequacy ranged from 1 to 10 and most frequently included randomization, allocation concealment, masking, intention-to-treat analysis, and accounting for dropouts. In 1 case, although 10 individual items had been summed, the authors noted that the purpose was to gain an overall impression of quality, and the data were not used for quantitative purposes.8 Three reviews used previously validated tools to assess methodological quality. The Jadad Scale15 was used in 2 reviews,13,14 and the PEDro Scale was used in the third.9 Two reviews used the Delphi List, another previously validated tool, with modifications.5,6 Two reviews quantitatively assessed only whether concealed allocation had been adequately described, although they included a more comprehensive list of quality indicators.7,8 None of these reviews reported estimates of reliability between raters.

We previously used the PEDro Scale to assess the methodological quality of 272 RCTs that were included in a systematic review of the stroke rehabilitation literature.16 In addition to physical and rehabilitation therapies (n=215), many of the therapies assessed in this review were pharmacological or surgical (n=57). The methodological quality of pharmacological trials included in this review was found to be significantly higher than that of nonpharmacological trials when the PEDro Scale was used (mean±SD=6.77±1.3 versus 5.53±1.3; P<.0001).17 The difference in quality between study types was largely attributable to the inherent difficulty of designing single-blind studies (ie, those in which participants are not aware of their group assignments) for nonpharmacological interventions, although double-blind studies (ie, outcome assessors are not aware of group assignments) also were less frequent for nonpharmacological interventions. As a means of formulating final conclusions in this review, only studies that achieved a PEDro score of 6 or greater were used when there was an abundance of evidence. Although an assessment of the reliability of the PEDro Scale was not included in this review and could not be conducted retrospectively, it was of interest to establish whether reliability estimates would vary depending on intervention type (pharmacological versus nonpharmacological).

Therefore, the purposes of this study were to assess how well 2 examiners agreed on specific items when using this scale (reliability), to determine whether the reliability and methodological quality differ between pharmacological and nonpharmacological studies, and to identify what aspects of RCTs tend to detract from their quality because these aspects are not incorporated into the study’s design or they are not reported or stated clearly. We anticipated that there would be good agreement between study types (pharmacological and nonpharmacological) for individual PEDro items and that the composite scores for pharmacological trials would once again be higher than those for nonpharmacological trials because of the inability in the latter to keep subjects unaware of group assignments (masking). Discrepancies attributable to interpretation and differences in scoring patterns between study types are discussed, with an emphasis on highlighting the practical considerations encountered with the PEDro Scale under typical use.


    Method
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion
 Conclusion
 Appendix
 References
 
Article Selection

This study was 1 component of a master’s thesis, the objective of which was to compare differences in effect sizes reported between trials that used double blinding and those that did not. In order to examine this contrast, studies that used pharmacological approaches (which were more frequently masked studies) and those that used nonpharmacological approaches (which, because of the nature of the interventions provided, were more frequently not masked studies) to treat the same medical condition were sought. Therefore, the inclusion criteria for selecting the meta-analyses were predefined by one of the authors (SKB). Previously published Cochrane Collaboration meta-analyses that evaluated both intervention approaches (pharmacological and nonpharmacological) for the same medical condition were retrieved, and the methodology of the trials was assessed with the PEDro Scale. Only 3 medical complication comparisons that were the subjects of both pharmacological and nonpharmacological investigations emerged: antidepressant treatments versus cognitive behavioral therapy for bulimia nervosa, excitatory acid antagonists versus surgery for stroke, and calcium supplementation versus exercise therapy for osteoarthritis.

PEDro Scale

The PEDro Scale consists of 10 criteria assessing the quality of study components related to internal validity2 (Tab. 1). Each item receives either a "yes" or a "no" score. The maximum score that a study can receive is 10. The PEDro score allocates up to 3 points for the level of masking achieved (eg, masking of subject, therapist, and assessor), 2 points for randomization procedures (random allocation, concealment of allocation), 3 points for the reporting of appropriate data (baseline characteristics, between-group comparisons, and point and range estimates of efficacy), and 1 point each for analysis of data (intention-to-treat analysis) and adequacy of follow-up. For the purposes of this review, follow-up (criterion 7) was considered adequate if all of the originally randomized participants were accounted for at the end of the study. This interpretation differs from that described by the PEDro Scale, which defines adequacy as the measurement of the main outcome in more than 85% of the participants. We modified this criterion because we believed that substantial bias could be introduced through imbalanced dropout rates between groups, even though 85% or more of the original participants were analyzed.18

The methodology of each study was scored by 2 experienced, independent raters who were familiar with the PEDro tool and who were well matched in terms of education and knowledge in research methodology (NCF and SKB), although neither had formal training in the application of the PEDro tool. Both raters were unaware of each other’s results until all of the studies were assessed, at which point discrepancies were identified and discussed. Discrepancies were classified as "error" or "interpretation." Errors were resolved easily when it was evident that 1 of the abstractors had simply missed its reference in the original article, and consensus was easy to achieve. Interpretation discrepancies occurred when the abstractors interpreted the presence or absence of an item differently because of its presentation in the original article. Items of disagreement and reasons for discrepancies were recorded and tabulated.

Statistical Analysis

Both mean (±SD) and median (interquartile range) composite PEDro scores, achieved after consensus was reached, were analyzed. Differences in median scores between pharmacological and nonpharmacological studies were analyzed by use of the Mann-Whitney U test. Differences in proportions of studies meeting criteria between intervention types (nonpharmacological and pharmacological) were evaluated by use of the chi-square statistic with a continuity correction.

The Cohen kappa statistic assessing pair-wise comparisons was used to estimate the interrater reliability of each of the 10 PEDro items for all intervention arms. The kappa statistic is a popular chance-corrected measure of agreement between 2 raters assessing a nominal-level variable.19 The kappa statistic ranges from 0 to 1.00, and a higher value is indicative of better reliability. Agreement between data abstractors on total composite PEDro scores was assessed by use of the ICC with a 2-way mixed-effects model (with the absolute agreement definition). In addition to scores for all studies combined, the kappa and ICC scores were derived for pharmacological and nonpharmacological interventions. SPSS version 12*1 was used for all analyses. A P value of less than .05 was considered statistically significant.


    Results
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion
 Conclusion
 Appendix
 References
 
Descriptive Statistics

Eighty-one RCTs from 6 Cochrane reviews were retrieved.20,25 Two trials included both drug versus placebo and therapy versus placebo arms as part of the trial design, resulting in a total of 83 scoring opportunities; 34 of these assessed nonpharmacological interventions, and 49 assessed pharmacological interventions. The publication dates for the individual RCTs ranged from 1961 to 2002.

Quality

The percentages of all studies that met criteria for each of the PEDro items after consensus was reached are shown in Table 2. The final PEDro scores, achieved after consensus was reached, ranged from a low of 2 (1.2%) to a high of 10 (1.2%). The most frequently occurring intermediate scores were 5 (25.3%), 6 (25.3%), and 7 (28.97%). Seven studies achieved a score of 8 (8.4%), and no studies achieved a score of 9.


View this table:
[in this window]
[in a new window]

 
Table 2. Percentages of Studies That Met Physiotherapy Evidence-Based Database (PEDro) Criteria

 
The average total PEDro scores were 5.94 (SD=1.43) for all studies combined, 6.88 (SD=1.2) for pharmacological studies, and 5.29 (SD=1.26) for nonpharmacological studies. The median score for pharmacological studies was significantly higher than that for nonpharmacological studies (7 versus 5, U=249.5, P<.0001). A higher percentage of drug studies than of nondrug studies met PEDro criteria for masking, adequacy of follow-up, and intention-to-treat analysis, whereas nondrug studies more frequently met criteria for concealed allocation, baseline comparability, and the inclusion of point estimates. Trials evaluating pharmacological interventions were more frequently masked trials with regard to both subjects and outcome assessors than were trials evaluating nonpharmacological interventions. The differences in proportions were statistically significant (97.1% versus 0%, P<.0001, for masking of subjects and 85.3% versus 32.7%, P<.0001, for masking of assessors).

Reliability

Regardless of study type, there was 100% agreement between raters for PEDro Scale item randomization and reporting of between-group comparisons, whereas the poorest agreement was found for concealed allocation and baseline comparability. The kappa scores for all studies and the breakdown for drug and nondrug studies are shown in Table 3. Kappa scores varied from a low of .452 for concealed allocation among drug trials to perfect agreement (1.00) for randomization and reporting of results from between-group comparisons. Because of the inherent limitations of the statistical test, the kappa score was 0 for 3 of the PEDro items, despite a high percentage of agreement, and the kappa score was small and negative for 1 item, despite a high percentage of agreement (the Appendix shows examples of these 2 phenomena). The ICCs associated with the cumulative PEDro score were .91 (95% CI=.83–.94) for all studies, .89 (95% CI=.78–.95) for pharmacological studies, and .91 (95% CI=.84–.952) for nonpharmacological studies.


View this table:
[in this window]
[in a new window]

 
Table 3. Mean (SE) Kappa Scores for Individual Components of the Physiotherapy Evidence-Based Database (PEDro) Scale

 

    Discussion
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion
 Conclusion
 Appendix
 References
 
Quality

Regardless of intervention type (drug versus nondrug), at least 75% of the trials met the criteria for random allocation, baseline comparability, betweengroup comparisons, adequacy of follow-up, and the inclusion of point estimates and measures of variability. Less than 30% of the trials fulfilled the criteria for concealed allocation or intention-to-treat analysis. Paradoxically, both of these components of trial design have been shown to be the most important in reducing bias.26 These results are consistent with those that we previously reported.17 Studies of either intervention type were infrequently awarded points for masking of therapist, as often there was no mention of a therapist, regardless of masking status. In the absence of reporting, a point could not be awarded. Although the absence of reporting does not mean that it did not occur, it is a limitation of all tools, which rely exclusively on the examination of the written publication.

Moseley et al2 also assessed the percentages of studies meeting PEDro criteria by evaluating 2,376 RCTs within the PEDro database. Our results are remarkably similar to theirs, with a few exceptions, which may have been attributable to either different inclusion criteria for the studies or the correctness of the ratings. Moseley et al2 reported that 94% of studies fulfilled the criteria for randomization, whereas we included only studies that were clearly randomized and excluded quasi-randomized or controlled trials. Higher percentages of studies included in the present review met the criteria for baseline comparability (84% versus approximately 65%) and for between-group comparisons (100% versus 89%).

Differences in Quality Between Pharmacological and Nonpharmacological Studies

The cumulative PEDro scores of RCTs evaluating drug interventions were significantly higher than those of RCTs evaluating therapy interventions, although drug studies did not consistently outperform therapy trials on an item-by-item basis. The percentages of nonpharmacological studies that met criteria for concealed allocation, baseline comparability, and the inclusion of point estimates were slightly higher, although the differences were small and no statistical tests of significance were performed. As predicted, the greatest difference in scores between intervention types was for subject masking, in which virtually all drug trials succeeded, whereas none of the therapy trials did. An unexpected finding was the difference between study types in the area of masked assessments; only a small percentage (33%) of nondrug trials succeeded in the masking of outcome assessors. Moseley et al2 also reported that a small percentage of trials used masked assessments in evaluations of physical therapies. However, the difference that we report with respect to study type is not easily explained. Although the difficulties with masking of subjects to group assignments in nonpharmacological trials are obvious, the obstacles to ensuring masked assessments are less so. A possible explanation for the shortcoming of therapy trials could be a lack of resources (eg, additional personnel were not available to carry out masked assessments), as these trials may have been conducted in a research setting rather than in a clinical setting. There was no more than a 5% difference between intervention types in the number of studies that met criteria for any of the other 8 PEDro items.

Estimates of Reliability

Although there is no consensus as to what constitutes a "good" or "acceptable" kappa score, for agreement that is less than 100%, guidelines interpreting the strength of agreement have been published.2729 With the use of any 1 of these 3 published guidelines, our agreement ranged from substantial or good to perfect for each of the 10 pair-wise comparisons. These estimates of reliability are consistent with those in 1 other published report.4 To date, we are not aware of other evaluations of the reliability of the PEDro tool.

In many cases, scoring discrepancies arose from ambiguity in reporting, as it was unclear whether criteria had been satisfied on the basis of what was explicitly stated. The extent to which a literal translation of the eligibility criteria is adhered to will affect the consistency of the agreement. For example, in 1 case, the term "placebo" was used, although the word "blinding" or masking was not. In another case, the authors reported that they attempted to keep assessors unaware of group assignments. Disagreement also arose when details of the study methodology appeared outside of the Method section. Although the value of assessing differences in baseline prognostic factors in RCTs has been the subject of debate,30 it was the second largest source of scoring disagreement. There were 3 cases in which 1 of the abstractors believed that too few clinically important baseline variables had been assessed for the equality of traits. On 2 occasions, a point was not awarded when there appeared to be a significant difference for a variable thought to be important, on the basis of either the results of a significance test or their own judgment. In these cases, abstractors had to make an educated guess as to the potential for bias arising from the imbalance. Subject area knowledge and expertise were influential in scoring under these conditions and resulted in the second lowest kappa score. There were also 3 disagreements as to whether criteria had been fulfilled for the reporting of point estimates and measures of variability. When improvements over time between intervention groups are reported, point estimates are not applicable and can lead to uncertainty in scoring.

The high ICC for the composite PEDro scores also suggests good agreement, although this test does not consider the way in which the final scores were reached. It is possible for 2 raters to reach similar scores without achieving consistency on an item-by-item basis.

Estimates of Reliability Between Intervention Classes (Drug and Nondrug)

In general, there was better agreement between raters for the nonpharmacological studies than for the pharmacological studies, although the differences were small. The item that caused the largest number of disagreements between intervention types was concealed allocation ({kappa}=.452 for drug studies versus {kappa}=.788 for nondrug studies). This item also represented the criterion met least often (Tab. 2). One possible explanation for the poor agreement was that although it was generally clear for nonpharmacological studies that there had been no attempt to ensure that allocation had been concealed, pharmacological studies more frequently had attempted to achieve this goal, although ambiguous language and incomplete descriptions of processes resulted in disagreements. This finding was particularly true for multicenter trials, in which the term "concealed allocation" often was not used, although one rater thought that it could be inferred, because centralized assignment usually is associated with this trial design. Adequately concealed randomization procedures, such as the use of opaque, sequentially numbered envelopes or off-site randomization, ensure that the investigators have no foreknowledge of subject group assignments and reduce bias by minimizing the possibility that the randomization schedule can be subverted. Although this concept seems straightforward, Schulz and Grimes30 suggested that the definition of concealed allocation generally is not well understood and is often confused with both masking and randomization, a situation that further underscores the need for the use of clear and explicit language. Although the percentages of studies that were awarded a point for masking of subjects and intention-to-treat analysis were low, there was still good agreement between raters, suggesting that the criteria had been stated explicitly.

The mathematical limitations of the kappa statistic were evident for several cases in which the kappa value was 0, despite a high percentage of raw agreement. This situation occurred when the product of 1 of the marginal totals was 0 and obviously remained 0 after it was divided by "n." The kappa value also took on a negative and nonsensical value in 2 cases in which, again, there was a high percentage of raw agreement. This result occurred because the value for expected agreement was greater than that for observed agreement. Examples of these calculations are provided in the Appendix for clarification. As far as we are aware, there is no agreed-upon solution to this dilemma (eg, adding 0.5 to each cell, in a manner similar to a Yates correction of a chi-square statistic).

Limitations of the study are that the "true" PEDro scores remain unknown, and consensus agreement on scale items does not necessarily mean that the raters were correct. In this respect, no claim can be made about the validity of the tool. However, we have likely successfully simulated a practical situation faced by many representative users of the tool attempting to score studies for potential inclusion in systematic reviews. Although the kappa value has statistical limitations, it is a commonly used statistical tool that many clinicians are comfortable using.


    Conclusion
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion
 Conclusion
 Appendix
 References
 
Evaluating the methodological quality of a clinical trial is often difficult because of a lack of reporting clarity, poor organization of the report, or the author’s failure to include salient details. In the present study with the PEDro tool, 2 raters unanimously agreed on the reporting of 2 trial components—randomization and whether between-group comparisons had been reported—for 81 RCTs. The poorest agreement was found for concealed allocation and baseline comparability. Therefore, there appears to be greater clarity of reporting for certain components of study methodology than for others. Although many scales and checklists are in use, there is no consensus as to which one(s), if any, can distinguish definitively between well and poorly conducted trials. However, certain components of trial methodology, such as randomization, concealment of allocation, masking, and intention-to-treat analysis, are known to influence the validity of results. Therefore, it is imperative that these items, which are most commonly associated with the potential for bias, be reported in the methodology section of a trial with transparent, unambiguous language, so that physical therapists are better equipped to identify studies that are more likely to yield valid results.


View this table:
[in this window]
[in a new window]

 
Table 1. Physiotherapy Evidence-Based Database Scale2,a

 

    Appendix
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion
 Conclusion
 Appendix
 References
 


Figure 1
View larger version (49K):
[in this window]
[in a new window]

 
Sample Calculations Demonstrating the Limitations of the Kappa Statistic
 


    Footnotes
 
This study was a modification from one component of a master’s thesis completed by Sanjit Bhogal at the University of Western Ontario in the Department of Epidemiology and Biostatistics (2004).

Ms Foley and Sanjit Bhogal were both involved in concept/idea/research design, writing, and data collection and analysis. Dr Bureau and Dr Speechley were consultants on the project. Dr Teasell procured funds and was a consultant.

This project was funded by the Canadian Stroke Network and the Heart & Stroke Foundation of Ontario.

* SPSS Inc, 233 S Wacker Dr, Chicago, IL 60606. Back


    References
 Top
 Abstract
 Introduction
 Method
 Results
 Discussion
 Conclusion
 Appendix
 References
 

  1. Verhagen AP, de Vet HC, de Bie RA, et al. The art of quality assessment of RCTs included in systematic reviews. J Clin Epidemiol 2001;54:651–654.[ISI][Medline]
  2. Moseley AM, Herbert RD, Sherrington C, Maher CG. Evidence for physiotherapy practice: a survey of the Physiotherapy Evidence Database (PEDro). Aust J Physiother 2002;48:43–49.[ISI][Medline]
  3. Verhagen AP, de Vet HC, de Bie RA, et al. The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol 1998;51:1235–1241.[ISI][Medline]
  4. Maher CG, Sherrington C, Herbert RD, et al. Reliability of the PEDro scale for rating quality of randomized controlled trials. Phys Ther 2003;83:713–721.[Abstract/Free Full Text]
  5. Verhagen AP, Scholten-Peeters GG, de Bie RA, Bierma-Zeinstra SM. Conservative treatments for whiplash. Cochrane Database Syst Rev 2004;(1):CD003338.
  6. Van den Ende CH, Vliet Vlieland TP, Munneke M, Hazes JM. Dynamic exercise therapy in rheumatoid arthritis: a systematic review. Br J Rheumatol 1998;37:677–687.[Abstract/Free Full Text]
  7. Green S, Buchbinder R, Hetrick S. Physiotherapy interventions for shoulder pain. Cochrane Database Syst Rev 2003;(2):CD004258.
  8. Handoll HH, Sherrington C, Parker MJ. Mobilisation strategies after hip fracture surgery in adults. Cochrane Database Syst Rev 2004;(4):CD001704.
  9. Ada L, Foongchomcheay A, Canning C. Supportive devices for preventing and treating subluxation of the shoulder after stroke. Cochrane Database Syst Rev 2005;(1):CD003863.
  10. Pollock A, Baer G, Pomeroy V, Langhorne P. Physiotherapy treatment approaches for the recovery of postural control and lower limb function following stroke. Cochrane Database Syst Rev 2003;(2):CD001920.
  11. Hayden J, Tulder M, Malmivaara A, Koes B. Exercise therapy for treatment of non-specific low back pain. Cochrane Database Syst Rev 2005;(3):CD000335.
  12. Dagfinrud H, Kvien TK, Hagen KB. Physiotherapy interventions for ankylosing spondylitis. Cochrane Database Syst Rev 2004;(4):CD002822.
  13. Barclay-Goddard R, Stevenson T, Poluha W, et al. Force platform feedback for standing balance training after stroke. Cochrane Database Syst Rev 2004;(4):CD004129.
  14. Milne S, Brosseau L, Robinson V, et al. Continuous passive motion following total knee arthroplasty. Cochrane Database Syst Rev 2003;(2):CD004260.
  15. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996;17:1–12.[ISI][Medline]
  16. Teasell RW, Foley NC, Bhogal SK, Speechley MR. An evidence-based review of stroke rehabilitation. Top Stroke Rehabil 2003;10:29–58.[Medline]
  17. Bhogal SK, Teasell RW, Foley NC, Speechley MR. Quality of the stroke rehabilitation research. Top Stroke Rehabil 2003;10:8–28.[Medline]
  18. Bhogal SK, Teasell RW, Foley NC, Speechley MR. The PEDro scale provides a more comprehensive measure of methodological quality than the Jadad scale in stroke rehabilitation literature. J Clin Epidemiol 2005;58:668–673.[ISI][Medline]
  19. Cohen JA. A coefficient of variation for nominal scales. Educ Psychol Meas 1960;20:37–46.[ISI]
  20. Shea B, Wells G, Cranney A, et al. Calcium supplementation on bone loss in postmenopausal women. Cochrane Database Syst Rev 2003;(4):CD004526.
  21. Muir KW, Lees KR. Excitatory amino acid antagonists for acute stroke. Cochrane Database Syst Rev 2003;(3):CD001244.
  22. Bacaltchuk J, Hay P. Antidepressants versus placebo for people with bulimia nervosa. Cochrane Database Syst Rev 2003;(4):CD003391.
  23. Prasad K, Shrivastava A. Surgery for primary supratentorial intracerebral haemorrhage. Cochrane Database Syst Rev 2000;(2):CD000200.
  24. Hay PJ, Bacaltchuk J. Psychotherapy for bulimia nervosa and binging. Cochrane Database Syst Rev 2003;(1):CD000562.
  25. Bonaiuti D, Shea B, Iovine R, et al. Exercise for preventing and treating osteoporosis in postmenopausal women. Cochrane Database Syst Rev 2002;(3):CD000333.
  26. Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials. BMJ. 1998;317:1185–1190.[Abstract/Free Full Text]
  27. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: John Wiley & Sons; 1981.
  28. Altman DG. Practical Statistics for Medical Research London, England: Chapman& Hall; 1990:403–409.
  29. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–174.[ISI][Medline]
  30. Schulz KF, Grimes DA. Allocation concealment in randomised trials: defending against deciphering. Lancet 2002;359:614–618.[ISI][Medline]



This article has been cited by other articles:


Home page
ptjournalHome page
C. G Maher, A. M Moseley, C. Sherrington, M. R Elkins, and R. D Herbert
A Description of the Trials, Reviews, and Practice Guidelines Indexed in the PEDro Database
Physical Therapy, September 1, 2008; 88(9): 1068 - 1077.
[Abstract] [Full Text] [PDF]


Home page
Neurorehabil Neural RepairHome page
G. Kwakkel, B. J. Kollen, and H. I. Krebs
Effects of Robot-Assisted Therapy on Upper Limb Recovery After Stroke: A Systematic Review
Neurorehabil Neural Repair, April 1, 2008; 22(2): 111 - 121.
[Abstract] [PDF]


Home page
ptjournalHome page
S. A. Olivo, L. G. Macedo, I. C. Gadotti, J. Fuentes, T. Stanton, and D. J Magee
Scales to Assess the Quality of Randomized Controlled Trials: A Systematic Review
Physical Therapy, February 1, 2008; 88(2): 156 - 175.
[Abstract] [Full Text] [PDF]


Home page
ptjournalHome page
A. Christie, G. Jamtvedt, K. T. Dahm, R. H Moe, E. A Haavardsholm, and K. B. Hagen
Effectiveness of Nonpharmacological and Nonsurgical Interventions for Patients With Rheumatoid Arthritis: An Overview of Systematic Reviews
Physical Therapy, December 1, 2007; 87(12): 1697 - 1715.
[Abstract] [Full Text] [PDF]


Home page
JBJSHome page
S. Chan and M. Bhandari
The Quality of Reporting of Orthopaedic Randomized Trials with Use of a Checklist for Nonpharmacological Therapies
J. Bone Joint Surg. Am., September 1, 2007; 89(9): 1970 - 1978.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Foley, N. C
Right arrow Articles by Speechley, M. R
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Foley, N. C
Right arrow Articles by Speechley, M. R


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2006 by the American Physical Therapy Association.