|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Movement Continuum Theory |
DD Allen, PT, PhD, is Adjunct Associate Professor, Department of Physical Therapy, Samuel Merritt College, Oakland, Calif, and Post-Doctoral Fellow, Health and Disability Research Institute, Boston University, Boston, Mass
Address all correspondence to Dr Allen at: allendianed{at}gmail.com
Submitted July 12, 2006;
Accepted March 1, 2007
| Abstract |
|---|
Subjects: More than 300 adult volunteers with various movement levels completed the 24-item questionnaire.
Methods: Item response theory methods were used to create the MAM and gather evidence of content and construct validity, test-retest and other types of reliability, and concurrent validity with the California Functional Evaluation instrument and self-acknowledgement of movement problems.
Results: The intraclass correlation coefficient for test-retest reliability was .92. Person separation reliability was .98. Correlation (r) with the California Functional Evaluation instrument was .76. Respondents who denied having movement problems perceived a significantly higher level of movement ability than those who claimed to have a little, some, or a lot of movement problems in the preceding week.
Discussion and Conclusion: The MAM shows promise for documenting perceived movement ability across ability levels and diagnoses.
| Introduction |
|---|
|
|
|---|
The profession needs a standardized way to assess outcomes across populations while focusing on physical therapy–specific contributions. The concept of movement is relevant across diagnostic groups and ability levels and is central to the work of physical therapists. A new measure focusing on movement may help to fill the gap left by current methods of measuring outcomes. This article describes the creation of the Movement Ability Measure (MAM) as a potential measure of the effectiveness of physical therapy intervention. Initial evidence of the validity and reliability of data obtained with the MAM also is presented.
When beginning to create a new instrument, a measurer may find, modify, or generate a theory that will guide the measurement choices. To guide choices about a new measure of physical therapy outcomes, the theory must have suitable relevance to the practice of physical therapy. The Movement Continuum Theory (MCT) of physical therapy1 meets this requirement. The MCT presents 3 general and 6 physical therapy principles that link movement science with individual movement capability and interventions provided in clinical practice (Tab. 1). In summary, movement, defined as an actual change in position, occurs at multiple interacting levels along a continuum from microscopic to the level of a person acting in society. Each level is influenced by physical, social, psychological, and environmental factors. Clinical interventions generally have their entry points at the tissue level or higher, but because the levels interact, interventions can affect molecular and cellular movement as well as body part and person movement. The MCT specifies that each person has maximum, current, and preferred movement capabilities. If a physical therapist successfully addresses movement problems with a patient, then current movement capability will increase, and the gap between current and preferred movement capabilities will narrow.1
|
The MCT provided a basis for the concepts in the new instrument, but an additional theory provided the methods for instrument development and evaluation. The procedures used in this study were based on item response theory (IRT) methods.2 Item response theory methods base measurements on mathematical relationships between people's abilities and the difficulties of the questions or tasks (items) that measure them, thus facilitating the direct assessment of constructs.3 Because the objective of the new instrument is direct assessment of the construct of movement, IRT methods were a good choice for this study. The mathematical complexity inherent to IRT methods, however, can hinder those who are unfamiliar with them. For the purposes of this study, analyses and results are summarized or depicted graphically, and more technical details are reported under the subheading "IRT Analyses." Readers interested in a general overview of IRT methods, particularly those used in this study, may find explanations by Wilson and colleagues2,4,5 and Wright and Masters.6 Following is an outline of the basic sequence for developing and initially testing an outcome instrument for assessing the effectiveness of intervention. In the "Method" section, the procedure and IRT methods used for developing and testing the MAM are described.
Before developing the instrument, the measurer defines more specifically what construct to measure, perhaps movement ability. Specifying the construct includes envisioning what movement might look like when a person's ability is low or high on the construct and what kinds of questions or tasks might best assess abilities at low and high ends of the construct. Upon envisioning movement abilities, the measurer can make a physical representation of the construct, placing different abilities and ways of assessing them in order as low, medium, or high locations on a hypothetical ruler or construct map.2 The construct map consists of a line with labeled locations on 2 sides of the line. The line is like the edge of a ruler, indicating less to more of the construct. The locations on the ruler side of the line represent units of measure, such as inches or centimeters on a ruler; in this study, the units will represent ordered ways of assessing movement abilities. The locations on the other side of the line represent the abilities being measured. As a whole, the construct map is a hypothesis: this is the order that the measurer thinks the abilities and assessments have on the construct. During later testing of the new instrument, comparing the hypothesized order with collected empirical data can provide evidence of construct validity.
Once ready to develop the instrument, the measurer generates and formats specific questions or tasks to provide detail to the assessment of people's abilities. These questions or tasks must cover the expected content and range of people's abilities while remaining practical with regard to time and complexity. The measurer hypothesizes the relationship between the questions or tasks and the abilities of the potential subjects. Later comparison of the hypothesized relationship with collected empirical data can provide evidence of content as well as construct validity.2
Once questions or tasks have been generated and formatted, the measurer assesses the resulting instrument. Assessment requires review by people who know the content, people who know measurement principles, and people who can represent the intended users. The instrument may undergo several revisions at this point. Finally, the measurer tests the revised instrument for its psychometric properties with a heterogeneous population that represents the span of the construct. This last testing stage provides evidence of the reliability and validity of the instrument through the use of methods appropriate to the instrument.
The forms and methods for assessing reliability and validity vary greatly, depending on the measure and the use to which it will be put. The projected use of an outcome measure for assessing the effectiveness of physical therapy intervention in various patient groups would be as follows. Patients would be assessed with the measure at the initiation of a course of physical therapy intervention. Patients then would undergo the course of physical therapy intervention, and data would be collected with the same measure upon discharge for comparison with the initial data.
For a physical therapy outcome measure, the forms of reliability assessment should include those that indicate likely sources and sizes of measurement errors with the intended use7: test-retest reliability, internal consistency,8 and person separation reliability.6 High test-retest reliability means that people with unchanging abilities will show the same data when the measure is applied a second time. Having unchanged data when abilities do not change increases confidence that a change in the data actually indicates a change in a person's ability rather than measurement error. High internal consistency means that data obtained on the items relate to each other and increases confidence that the items assess a single construct or highly related constructs. High person separation reliability means that apparent differences between people on the measure are less likely a result of errors in measurement and increases confidence that people who get different "scores" on the measure actually have separate abilities. Low values on any of these forms of reliability assessment indicate larger measurement errors and thus less confidence that comparisons actually can reveal true differences.
For a new physical therapy outcome measure, the forms of validity assessments should indicate the content of the measure, establish the measure's relationship to the newly defined construct, and compare the measure with other measures or constructs. In terms used in the 1999 Standards for Educational and Psychological Testing,7 these validity assessments incorporate evidence about instrument content, internal structure, and external variables (ie, relationship to other measures). In terms used in the Guide to Physical Therapist Practice,8 this validity evidence includes content, construct, concurrent, convergent, and divergent forms of validity.
With IRT methods and the instrument development steps described here, support for construct and content validity starts with instrument development. Documenting the relationships between the developing measure and the construct and between the content range of the measure and the intended population provides the initial evidence of validity. Additional empirical evidence of construct and content validity can be obtained through comparisons of empirical responses on the new measure with the construct map that was created before the measure was formed. Concurrent validity evidence can be obtained by comparing data from the new measure with alternative indications of health and movement problems from separate questions, items, or measures. Convergent and divergent validity can be obtained by comparing data from the new measure with data obtained from measures of alternative constructs, such as general health or functional ability. The California Functional Evaluation instrument (CAFE-40)9 proved to be a useful comparison measure for this study because it addresses both general health and functional ability, uses a self-report format, and focuses on outpatient clinical populations. The focus on outpatients means it should have less of a ceiling effect than many similar health-related measures.
| Method |
|---|
|
|
|---|
To specify the construct further, people with very low perceived movement capability were envisioned as being unable to move (ie, paralyzed), whereas people with high perceived movement capability might move competitively, like athletes or star performers of physically demanding work or arts. Between the lowest and highest ability levels, gradations of movement capability might progress from needing help for daily activities to engaging or participating in movement beyond routine activities.10 The ability side of the construct map is shown in Figure 1. Six gradations or numbered levels show ordered progression, and color coding like a rainbow shows the potential for the overlap of movement capability among levels.
|
|
The resultant MAM (Appendix) asks for respondents perceptions of current and preferred movement capabilities in 6 movement dimensions. The MAM response choices are 1 to 6 for "now" responses and 1 to 6 for "would like" responses for each item, with 1 assigned to the lowest movement levels selected and 6 assigned to the highest. The MAM consists of 24 items, with 4 items representing each of the 6 dimensions. The total possible raw score is 144 if a person chooses the highest item response levels across all 24 items. For the purposes of this study, only "now" responses were analyzed, and all analyses were performed on the basis of the numbers that people indicated for the items without regard to separate dimensions.
The writing of the MAM is rated at a grade level of 8.2 (as designated by Microsoft Word software*). Although it is relatively easy to comprehend, its usefulness is restricted to people who have the mental capacity to conceive of the abstract ideas of "current" and "preferred" movement capabilities and to people who have the persistence to complete the measure. For people with severely restrictive physical disabilities or for people who do not read English, an assistant might help in completing the measure, to the extent of documenting the perceptions of the respondents themselves. Caregivers might complete it on behalf of children or patients who do not have the mental capacity to complete it for themselves, but these populations were not addressed in evaluating the MAM for this study.
Psychometric Testing of the MAM
Because the intended population for the MAM could have a wide range of movement abilities, attempts were made to recruit volunteers from a heterogeneous mixture of people. People volunteering to complete the questionnaire were recruited through personal and professional contacts, religious organizations, a senior activity event, a college sports team, outpatient physical therapy clinics, and community groups. In addition to the questionnaire, respondents completed a cover sheet of information about health and movement, including responses to a question asking whether they were healthy (either with or without accompanying medical conditions) and a question asking whether they had noted movement problems in the preceding week. Most respondents also completed the CAFE-40. Some respondents completed both questionnaires twice, at an interval of about 2 weeks, for test-retest analyses. Respondents were informed that completing and returning the questionnaire constituted consent for their (anonymous) responses to be included in the study.
A total of 318 respondents completed the MAM. Of those, 283 also completed the CAFE-40,9 and 34 completed both questionnaires twice for the test-retest assessment. The mean age of the respondents who completed the MAM was 55 years (minimum and maximum were 18 and 101, respectively), 206 were women, and 180 acknowledged at least a little movement difficulty in the preceding week. Only 39 respondents stated that they were undergoing or about to start physical therapy intervention at the time of completing the questionnaire. Most respondents had no acute medical conditions at the time of completing the questionnaire, but ongoing diagnoses included musculoskeletal (including upper-extremity, lower-extremity, or spine problems), neurological, endocrine, and cardiovascular pathologies.
For the group of 34 respondents who participated in the test-retest portion of the study, the mean age was 54 years, with a minimum and a maximum of 19 and 78 years, respectively. Seventeen respondents were women. None of these people were undergoing or about to start physical therapy intervention.
Responses to "now" questions were analyzed by use of IRT methods and ConQuest software.12,
People's abilities and item response levels were estimated as locations on an empirically derived scale for self-perception of movement ability. The units for the empirically derived scale are logits (natural log of the odds), indicating the probability of responses to different movement ability levels within the 24 items given the perceived ability of the person and the difficulty of the item level. More technical details of methods and results are reported under the subheading "IRT Analyses."
For reliability testing, ConQuest12 produced both internal consistency (Cronbach
) and person separation reliability (through the use of weighted likelihood estimation of respondent locations) calculations. Respondent locations on the empirical scale were compared at test and retest with the intraclass correlation coefficient (ICC, model 3). Form k=24 was used for the ICC because the person locations were estimated on the basis of responses across 24 items.13
For validity testing, the empirically derived scale for self-perception of movement ability was depicted graphically in a Wright map.2 A Wright map shows the relationship between the locations of person abilities and item response levels. A Wright map is an empirical analog to the hypothetically derived construct map of Figures 1 and 2. Content and construct validity evidence was obtained by examining the Wright map. Respondent locations were expected to cluster in the area equivalent to performing normal activities plus extra activities, with some spread across the scale and fewer respondents at the lowest and highest levels. Item response levels were expected to match the levels of the respondents and extend both higher and lower than these respondent levels. Item response levels were expected to be ordered as on the construct map and distinct from each other rather than too close together.
As another test of validity, the respondent locations provided by ConQuest were aggregated into groups differentiated by the responses to the health and movement questions. The question about health had a binary response set; therefore, the average respondent locations for the "healthy" and "not-healthy" groups were compared by use of a 2-tailed t test, with the alpha level set at .05. The demographic question asking respondents "Have you had movement problems this week?" had 4 ordinal responses—"no, not at all," "yes, a little," "yes, some," or "yes, a lot"—scored 0 to 3, respectively. The average respondent locations for the 4 groups were compared by use of analysis of variance. If warranted, 3 contrasts were planned between adjacent pairs of groups by use of the Bonferroni t test to determine the minimum significant difference, with an overall alpha value of .05.13 Finally, responses to the MAM and to the CAFE-409 and its 2 components—general health and functional ability—were correlated to assess convergent and divergent validity.
IRT Analyses
ConQuest12 estimated respondent and item response level locations on the basis of the data and different models specified. ConQuest also produced statistics to help determine how well the data fit the proposed models. Rasch 1-parameter IRT models were chosen for analyzing the data because the stability of item response locations in the 1-parameter model allows mapping with the Wright map and subsequent comparison with the construct map for this new measure.2 Respondent locations were estimated by use of weighted likelihood estimates. Item response level locations for the Wright map were depicted as thresholds, the locations at which the probability of choosing the number representing that movement ability level or higher was .5.12
Both rating scale and partial-credit models6 were tested in this study; the partial-credit model fit the data better, according to the G2 likelihood ratio test (
292=570, P<.0001), indicating that the item response levels were different across items rather than the same. With the partial-credit model, all but 1 item (out of 24) and 1 item step (out of 120 steps, or borders between 2 response levels) fit the data within the criterion boundaries: estimated locations must be associated with a weighted mean square between 0.75 and 1.34 and a weighted mean t statistic between –1.96 and +1.96.4 The exceptions included the first adaptability item (weighted mean square=1.37 with t=3.8) and the last item step of the third speed item (weighted mean square=1.38 with t=2.0). The 1-parameter, partial-credit, unidimensional model was used for estimating respondent and item locations and their standard errors for the purposes of this study. Respondent locations were constrained to have a mean of 0 logit; the standard deviations of respondent locations and all item locations were unconstrained. The item separation reliability was .98,12 indicating high confidence (small error) in the estimated locations of the items in this study.
| Results |
|---|
|
|
|---|
Content and Construct Validity
The Wright map showed respondent locations to the left and item response level thresholds to the right on a logit scale representing self-perception of movement ability. The Wright map shown in Figure 3 has been modified so that 5 brackets substitute for 120 item response level thresholds. Each bracket indicates the span of locations of the same response level thresholds across the 24 items. The respondents had estimated locations from about –8 to +6 logits, whereas the item response level thresholds had estimated locations from about –10 to +8 logits. Thus, the locations for the assessment units (item response levels) of the MAM extended both above and below the locations for the sample (respondents) in this study, evidence of content validity. The mean of the respondent locations, at 0 logit, corresponded to a movement ability level at which respondents perceived that they moved enough for normal activities plus extra activities.
|
|
Of the respondents who answered the question about movement problems in the preceding week, 134 reported none, 92 reported a little, 60 reported some, and 26 reported a lot of problems. The average respondent locations for scores 0, 1, 2, and 3 were 1.45, 0.07, –1.87, and –3.62 logits, respectively (Fig. 4). The differences between groups were found to be significant by analysis of variance. The planned contrast between adjacent pairs of groups with a minimum significant difference of 0.93 logit revealed that all pair-wise differences also were statistically significant, evidence of concurrent validity with self-reported movement problems.
|
| Discussion and Conclusion |
|---|
|
|
|---|
The range of item response levels covered the respondents well, with item response thresholds both above and below the range of respondent locations. The range comparison shows a lack of a ceiling or floor effect for these respondents. When considered with the systematic design that included input from structured interviews and review by content experts, this range is strong evidence of content validity. The response levels were distinct across the sample; the thresholds between movement ability levels were ordered as hypothesized in the construct map (compare Fig. 2 with the right-hand side of Fig. 3), which is strong evidence of construct validity. In raw score terms, moving up one movement ability level from levels 2 to 6 is roughly equivalent to responding to every one of the 24 MAM items at the next higher numbered response level (Tab. 2), as designed.
In addition to providing evidence of content and construct validity, the data in this study also provide evidence of good internal consistency and reliability, along with evidence of validity based on other indicators. Through their responses, participants indicated that their ideas of health, general health, and lack of movement problems in the preceding week aligned as expected with their idea of current movement ability. Their idea of function seemed to be distinct from, although moderately correlated with, their idea of movement ability.
Limitations of this study include the small numbers of respondents at the ceiling and floor of the MAM. Attempts were made to recruit respondents from a broad range of movement ability levels, including the upper range (from a college competitive sports team), the lower range (from a community senior activity event that included people with multiple disabilities), and a large range of adult ages (from 18 to 101 years). Despite these attempts, respondents at the far extremes of the construct were sparse. The inclusion of world-class athletes and people who have very limited movement as a result of paralysis or other severe impairments would strengthen the assessment of possible ceiling and floor effects of the MAM.
Although purposive sampling is used commonly in the testing of new instruments in order to obtain responses representing a broad range for the construct in question,13 comparison of such data to the general population is limited by potential biases. People who volunteer to complete a questionnaire such as the MAM may have a particular bias either toward movement as an important concept in their lives or toward documenting their problems with movement. Evidence of the latter was apparent in the number of respondents, more than one half, who reported at least a little movement trouble in the preceding week. This figure seems high for the general population, although lack of data on this construct from any other study restricts interpretation. Only 39 of the total number of respondents were undergoing or beginning a course of physical therapy intervention; therefore, having a stake in physical therapy outcomes was not a particular bias for participating in this study.
The length of the questionnaire hindered many people, as indicated by their comments. Making the response levels for each item explicit resulted in 11 pages of words for the respondents to read when health and movement questions and the CAFE-40 were included. A comparison of the amount of missing data in this study with that in a study of 2,914 patients taking the Medical Outcomes Study 36-Item Health Survey Questionnaire (SF-36),15 a health-related quality-of-life measure, revealed a slightly better response on the MAM. Only 1.96% of the responses on combined "now" and "would like" items for the MAM were missing, compared with 4.25% of the responses in the SF-36 data set (and 8.02% of the responses in the CAFE-40 data set for respondents in the present study). Differences between the 2 studies include the fact that people in the SF-36 study might have been generally sicker; on the other hand, the SF-36 is shorter, with fewer words than the MAM, although with 50% more items. Missing data could be interpreted to mean that subjects overlooked an item (or page of items), found it too difficult to answer, ran out of time, or lost interest or impetus midway through the MAM.
The MAM records only respondents perceptions of movement ability, not quantitative measures of that movement. Like any self-report measure, the MAM depends on respondent insight and veracity. In addition, self-report measures depend on some consistent interpretation of the questions over the period of interest for assessment. For the MAM, the interpretation of "normal" or "everyday" activities certainly would change over a person's lifetime. Most people adjust their lifestyle to match whatever their movement ability allows. A note on a returned questionnaire put it this way:
I have modified my work and my activities due to age. I chose to change careers to a task which is reasonable for my age, knowing that the work I did before wasn't well suited for [an older person]. The sports I did at 30 were okay at 40 but don't work [as I grow older]. I gave up on technical rock climbing, surf kayaking and backpacking as I don't have the strength. But I still can walk, kayak—I just have to be more careful of tendons and falling.
Other respondents said, "I realized after I filled out the questionnaire that I had left all the activities I normally avoid out of consideration altogether," or simply, "this week with back trouble" as a caveat to the item responses. Thus, interpretation of the term "normal activities" can change, at least in the long term, with both benefits and possible drawbacks. Benefits include the ability of the MAM to apply to a wide variety of respondents whatever their time of life. The use of terms such as "normal activities" and "normal activities plus extra activities" allows the incorporation of respondent preference into self-perception of movement and the ways in which movement affects activity and participation. Drawbacks include the possible loss of reliability in "now" responses on the MAM if the time period for the assessment extends over life changes unrelated to intervention. Studies with appropriate comparison groups can help to control for this potential drawback.
In summary, the MAM should not be used as an absolute measure for comparing movement ability between 2 individuals, as quantitative measures of movement can. The MAM should be used for comparisons within a particular individual over designated time periods or across groups of individuals completing the MAM on a single occasion or multiple occasions associated with an episode of care. Despite the limitations, the MAM shows promise as a way to document perceived movement ability across movement levels and diagnostic groups. With evidence of responsiveness, the MAM also may provide a record of change in perceived movement ability as a result of physical therapy intervention, thus adding to the profession's growing body of evidence of effectiveness.
| Appendix |
|---|
|
|
|---|
|
| Acknowledgments |
|---|
The Committee for the Protection of Human Subjects at the University of California, Berkeley, designated this study exempt from further review.
Data and constructs of this manuscript were presented in a platform presentation at the International Objective Measurement Workshop; April 6, 2006; Berkeley, Calif.
This article arose from the author's doctoral dissertation at the University of California, Berkeley, December 2005.
| Footnotes |
|---|
Australian Council for Educational Research, Hawthorn, Victoria, Australia. ![]()
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |