Skip Navigation


Health Education Research Advance Access originally published online on July 18, 2006
Health Education Research 2006 21(Supplement 1):i58-i72; doi:10.1093/her/cyl054
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
21/suppl_1/i58    most recent
cyl054v2
cyl054v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Heesch, K.
Right arrow Articles by Dunn, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Heesch, K.
Right arrow Articles by Dunn, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s).
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Using Rasch modeling to re-evaluate three scales related to physical activity: enjoyment, perceived benefits and perceived barriers

KC Heesch1,2,*, LC Mâsse3,5 and AL Dunn4,6

1 School of Human Movement Studies, University of Queensland, Brisbane, QLD 4072, Australia
2 Department of Health and Exercise Science, University of Oklahoma, Norman, OK 73019, USA
3 Health Promotion Research Branch, National Cancer Institute, Bethesda, MD 20892, USA
4 The Cooper Institute, Dallas, TX 75230, USA

* Correspondence to: K. C. Heesch. E-mail: kheesch{at}hms.uq.edu.au


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
Studies suggest that enjoyment, perceived benefits and perceived barriers may be important mediators of physical activity. However, the psychometric properties of these scales have not been assessed using Rasch modeling. The purpose of this study was to use Rasch modeling to evaluate the properties of three scales commonly used in physical activity studies: the Physical Activity Enjoyment Scale, the Benefits of Physical Activity Scale and the Barriers to Physical Activity Scale. The scales were administered to 378 healthy adults, aged 25–75 years (50% women, 62% Whites), at the baseline assessment for a lifestyle physical activity intervention trial. The ConQuest software was used to assess model fit, item difficulty, item functioning and standard error of measurement. For all scales, the partial credit model fit the data. Item content of one scale did not adequately cover all respondents. Response options of each scale were not targeting respondents appropriately, and standard error of measurement varied across the total score continuum of each scale. These findings indicate that each scale's effectiveness at detecting differences among individuals may be limited unless changes in scale content and response format are made.


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
For theory-based physical activity interventions, strategies are developed to impact constructs posited to be mediators of behavior change [14]. Some investigators have evaluated the success of theory-based interventions by assessing changes pre-intervention to post-intervention in these constructs along with changes in physical activity behavior, but they have not consistently found the constructs studied to have mediating roles [5].

To date, assessment of the psychometric properties of scales measuring these constructs has been limited. Without sound analyses of scale properties, it is unclear whether the inability of investigators to find a consistent mediating role of constructs [5] reflects poor measurement of constructs, poor translation of constructs into practical strategies or a lack of association between the hypothesized mediators and physical activity. To thoroughly evaluate the properties of scales measuring these constructs, advanced psychometric methods, such as Rasch modeling, are needed. In contrast to classical test theory (CTT), Rasch modeling can assess whether (i) the content of the scale items covers the range of respondents' perceptions about the construct (e.g. on a scale measuring physical activity benefits, are there items targeted toward those perceiving few benefits as well as toward those perceiving many benefits?), (ii) the response options are appropriate for the respondents and (iii) the standard error of the scale is maintained across the range of scale scores [610].

The purpose of this study was to use Rasch modeling to assess the psychometric properties of scales measuring constructs thought to be mediators of physical activity. To limit the breadth of investigation, we have focused the analyses on three constructs measuring perceptions. The scales were (i) the Benefits of Physical Activity Scale [11], (ii) the Barriers to Physical Activity Scale [11] and (iii) the Physical Activity Enjoyment Scale (PACES) [12]. Findings from studies evaluating the properties of these scales using CTT indicated that among college students and youth, each had adequate test–retest reliability and construct validity [1116]. However, the few studies that assessed the mediating role of constructs measured by these scales had inconsistent findings [5], suggesting that measurement problems may exist for scales measuring these constructs.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
Participants
Baseline data of participants recruited for Project Physically Ready for Invigorating Movement Everyday, a 24-month randomized physical activity trial [17], were used to re-evaluate the properties of the three scales. These participants were 378 healthy adults who were not meeting national guidelines for physical activity. Their mean age was 49.8 years (SD = 9.6), and half the sample (49.5%) were women. Most were White (62.4%), but African–Americans (19.6%), Latinos (14.8%), Native Americans (1.3%) and Asian–Americans (0.8%) were also represented. Just 1.1% of respondents reported ‘other’ as their race/ethnicity. For the 275 who reported years of education, the median year was 16.0 (M = 15.0, SD = 2.7). More details about their characteristics have been reported elsewhere [17].

Protocol
Potential participants were recruited through a variety of sources including the local media and word of mouth. They were screened by telephone and if eligible, gave written informed consent and completed baseline measures at an orientation session. After completing the baseline measures, participants were randomized into one of two lifestyle physical activity interventions or into a standard care group. The same measures were administered again at 6-month and 24-month follow-ups. For this study, the baseline data were analyzed, and data from all three study groups were included in the analysis.

Measures
Enjoyment
PACES is an 18-item, self-administered scale developed by Kendzierski and DeCarlo [12]. The scale was initially developed to measure enjoyment toward exercise, but was modified to assess enjoyment toward physical activity. Respondents were asked to rate their current feelings about physical activity (see Appendix A) using a seven-point semantic differential approach. As done in the original development of the scale [12], the scale was analyzed as a Likert scale. A total scale score was computed by summing responses to all items after recoding some items so that a high score indicated high enjoyment, whereas a low score indicated little enjoyment. In two samples of college students, Kendzierski and DeCarlo [12] found the exercise measure to have high internal consistency. In another study [13], scale scores were found to correlate with stage of motivational readiness to change (r = 0.54) and self-efficacy (r = 0.37). In two studies of youth [18, 19], the structure of the scale was examined, but the unidimensionality of the scale was confirmed in only one of those studies. For that study, the scale was adapted for the population under study [19].

Benefits
The Benefits of Physical Activity Scale is a 14-item self-administered questionnaire. It was developed originally as a 10-item scale [11], but has been modified and expanded over time to its current version [15, 16, 20]. Respondents were asked to indicate if they expected any positive psychological or physical outcome from participating in regular physical activity or sports (see Appendix B). Responses were on a five-point Likert scale ranging from ‘strongly disagree’ to ‘strongly agree’. A scale score was computed by summing the responses to all items with a high score representing perceptions of many benefits. Scores on the original scale correlated significantly with reports of exercise in adults (r = 0.24) [11]. In a study of college students using the 14-item scale [16], 1-week test–retest reliability was found to be moderate (r = 0.55). In another study of college students using the current version [15], internal consistency (Cronbach's {alpha} = 0.88) and 1-week test–retest reliability (r = 0.85) were found to be high.

Barriers
The Barriers to Physical Activity Scale is a 25-item self-administered measure of perceived barriers to performing physical activities (see Appendix C). It was originally developed as a 15-item scale [11], but has been modified and expanded over time [16, 20]. For each item, respondents were asked if the situation or perception described prevented engagement in physical activities. Responses were on a five-point Likert scale (0 = ‘never’ to 4 = ‘very often’). The items were summed to create a scale score with higher scale scores representing perceptions of more barriers to performing physical activities. The scale's developers found scale scores to be significantly and inversely correlated with exercise in adults (r = –0.22) [11] and to change significantly with changes in exercise [21]. In college students, the 1-week test–retest reliability of the revised scale was found to be adequate (r = 0.79) [16]. Information about the internal consistency of the scale has been lacking in the literature.

Analyses
Preliminary analyses
For all scales, descriptive statistics were computed as well as Cronbach's alpha for assessing internal consistency. An alpha value of 0.70 or greater indicated that a scale had adequate internal consistency [22].

Testing Rasch modeling assumptions
Using exploratory factor analysis, the unidimensionality assumption was tested by forcing a one-factor solution. Unidimensionality was confirmed if the eigenvalue plot (i.e. scree plot) showed one dominant first factor, the solution explained at least 20% of the variance and the factor loadings were >0.30 [23].

Rasch modeling analyses
Rasch modeling was performed using the ConQuest software [24], which uses the Rasch family of logistic models. A Rasch analysis was preferred over an item response modeling analysis given the properties of the Rasch model (e.g. scaling the items and respondents on the same scale) and the sample size of our data set [69].

The first step in the analysis was to determine the best-fitting Rasch model by comparing the fit of the rating scale [25] and partial credit [26] models. The rating scale model assumes that the distances between each pair of ordinal response options are the same across the items. In contrast, distances between ordinal response options are not assumed to be equal across all items in the partial credit model. To compare fit of the two models, a likelihood ratio test was performed by comparing the difference between the two models' deviance parameters (the deviance equals twice the loglikelihood and is assumed to have a chi-square distribution). In addition, the number of items that fit the model was assessed. Fit was determined by computing weighted mean square fit statistics for each item, which indicate whether residuals vary as much as expected given the observed distribution [27]. Items for which the weighted fit statistic was <0.75 or >1.33 and for which the weighted t-statistic was <–2.00 or >2.00 were considered to be fitting poorly [28]. The model with the fewest poorly fitting items was deemed to provide a better fit.

As the next step, the Wright item-person map was visually inspected. This map provides both the items by thresholds (the location at which the cumulative probability of selecting one response option of an item versus all previous response options reaches 0.50) and the respondents on the same logit scale. By convention, the Rasch model transforms the raw scores to log odds ratios on a common interval, with 0.00 being allocated to the mean. The Wright map is useful for determining if the items are appropriate for the respondents. Ideally, a scale should have items by thresholds distributed in a fairly uniform way along the Rasch scale continuum where persons are located, indicating a scale contains content appropriate for all respondents.

Functioning of the response format was visually assessed by examining item characteristic curves (ICCs). One ICC graph was plotted for every item. On a graph, one curve was plotted for each response option (e.g. one curve for each of the seven response options of the enjoyment scale), and this curve showed the probability of selecting the response at each logit along the Rasch scale continuum (i.e. at each respondent's total scale score converted into a logit score). The range on the Rasch scale continuum where a curve was higher than all other curves in the graph signified where a response option had a greater probability than all other response options of being selected. Failure of a curve to rise higher than curves of other response options anywhere along the Rasch scale continuum indicated that the option never had the greatest probability of being selected for any respondent.

The last step in the analyses was to estimate the standard errors of measurement for each scale. In Rasch modeling, standard errors are conditional upon location on the Rasch scale continuum [9], meaning that they were conditional upon the amount of enjoyment, benefits and barriers perceived by respondents in this study. Because the standard errors are conditional, measurement precision can be assessed at each logit along the Rasch scale continuum.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
Unidimensionality
The unidimensional assumption was met for the enjoyment and the perceived benefits scales but not the perceived barriers scale. The percentage of variance explained by the one-factor solution was adequate for all scales: 52% for the enjoyment scale, 39% for the perceived benefits scale and 23% for the perceived barriers scale. The eigenvalue plot for the enjoyment and perceived benefits scales revealed one dominant factor, whereas the plot for the perceived barriers scale revealed the possibility of two factors. In addition, factor loadings were >0.30 for all enjoyment and perceived benefits scale items but low (0.18 to 0.26) for four items of the perceived barriers scale (Items 4, 21, 22 and 23) and adequate for all other perceived barriers items (0.33 to 0.65). Given these results, a two-factor solution was examined for the perceived barriers scale. This solution explained 33% of the variance. Items 4, 21, 22 and 23 loaded onto one factor while all other items loaded onto a separate factor. The four items represented time demands externally imposed by other individuals (time demand barriers), and the other items represented perceived barriers internal to the individual (internal barriers). Given that the two factors were not meaningfully correlated (r = 0.15), they were analyzed as separate scales for the remaining analyses.

Descriptive statistics
Descriptive characteristics of the scales are presented in Table I, along with internal consistency coefficients. Three extreme outliers (>4 SD from the mean) were found for the perceived benefits scale and were dropped from all analyses. Mean scores suggested that respondents perceived some enjoyment from physical activity, agreed that most items represented true benefits of physical activity, perceived few internal barriers to physical activities and perceived few time constraints to being physically active. The internal consistency of each scale was adequate (Cronbach's {alpha} > 0.70).


View this table:
[in this window]
[in a new window]

 
Table I For scales measuring enjoyment, perceived benefits and perceived barriers, scale descriptives, internal consistency reliability and Rasch model fit criteria

 
Model fit
For all scales, the deviance parameter for the partial credit model was statistically lower than that of the rating scale model (P < 0.001), suggesting a better fit with the partial credit model (see Table I). Computing the percentage of parameters that did not fit each model (i.e. weighted fit indices and their t-statistic outside the acceptable range) confirmed this observation (see Appendix for fit indices of the partial credit model item difficulties). The percentage of item difficulty parameters fitting was higher in the partial credit model than in the rating scale model for the enjoyment and perceived internal demands scales but the same for the perceived benefits and perceived time demand barriers scales. However, for the four scales, all the parameter estimates for the item by response category were outside acceptable ranges in the rating scale model while they were all within acceptable ranges in the partial credit model for the perceived barriers scales and mostly within acceptable ranges for the enjoyment and perceived benefits scales (data not shown but available upon request). These results indicated that for each of the four scales, distances between response options were not the same across items, suggesting the partial credit model fit better.

Item fit
Two items with weighted fit indices >1.33 with large t-statistics were flagged. They represented items that potentially did not contribute to a scale's ability to differentiate respondents based on the underlying construct because they may have measured a different dimension of the construct [29]. The items were Item 5 from the enjoyment scale [Mean Square item fit index (MNSQ) of Item 5 = 2.57, t = 14.1; MNSQ of Item 5 Option 6 = 1.37, t = 2.1] and Item 1 from the perceived benefit scale (MNSQ of Item 1 = 1.61, t = 6.1; MNSQ of Item 1 Option 1 = 1.59, t = 3.3). Item 5 from the enjoyment scale seems to describe a person's mental state while performing physical activities (i.e. ‘being absorbed’ versus ‘not absorbed’ by physical activity), while the other enjoyment items describe strong negative or positive emotions that occur during and after performing physical activities. Item 1 from the perceived benefit scale appears to focus on an immediate change after a bout of physical activity (‘feeling less depressed or bored’), whereas the other items seem to address long-term benefits.

Assessment of items by thresholds difficulty with item-person maps
The Wright item-person maps show the location of items by thresholds and of persons along the Rasch scale continuum (i.e. total scale scores on a logit scale with mean = 0.00 and standard deviation unconstrained). Fig. 1 provides, as an example, the item-person map for the enjoyment scale. Data for the other scales are not shown but are available upon request. In this map, the person distribution ranged from respondents who had high scores on the scale (x's located at the top left of the map) to respondents who had low scores on the scale (x's located at the bottom left of the map). In the example, respondents perceiving the greatest enjoyment from physical activities are located at logits ~3.00, and those perceiving the least amount of enjoyment are located at logits around –3.00. Few respondents were found to be at either extreme. Most respondents were located at logits between –1.00 and +1.00, a moderate range of perceived enjoyment from physical activity. To the right of the person distribution is the items by thresholds distribution with the thresholds number shown at the top of Fig. 1. On the seven-point enjoyment scale, there are six thresholds (total number of response options minus one). Threshold 1, for example, represents a 0.50 probability of selecting Option 2 over 1, and Threshold 2 represents a 0.50 probability of selecting Option 3 over Options 1 and 2. On the Wright map, a threshold is shown for each item (hence, the name item by thresholds) and is denoted by the item number on the map.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Example of a Wright item-person map: map of the enjoyment scale.

 
As shown in Fig. 1, item by threshold location increases on the Rasch scale continuum as we move from Threshold 1 to Threshold 6, and it is a typical pattern for a Likert scale. For example, Threshold 1 for Item 18 is located at ~–2.00 on the Rasch scale continuum. This location signifies that respondents must have perceived at least moderately low enjoyment from physical activity to have a 50% probability of selecting Option 2 over Option 1. In contrast, at Threshold 6, Item 18 is located at a logit of ~2.50. At this location, respondents must have perceived a moderately high level of enjoyment from physical activity to have a 50% probability of selecting Option 7 over all other response options. It should be noted that at each threshold, Item 5 is higher on the Rasch scale continuum than any other item. This finding signifies that the level of enjoyment required to have a 50% likelihood of selecting, for example, Option 4 versus Options 1, 2 and 3 was greater for Item 5 than for any other item on the scale. In other words, respondents who perceived the greatest amounts of enjoyment from physical activity tended to select higher response options for this item than they selected for the other items.

In addition, the map indicates whether items by thresholds occupy the same location on the Rasch scale continuum as respondents. Items by thresholds at the same location as persons are said to provide content coverage for these individuals [29]. For example, Threshold 2 of all items except Item 5 provided content coverage for respondents perceiving moderately low enjoyment from physical activity (at logits ranging from –1.50 to –2.00), whereas Threshold 6 of all items provided content coverage for respondents perceiving moderately high enjoyment (at logits ranging from 1.50 to 3.00). Areas of the logit scale where items by thresholds were located but where no person was located indicate items by thresholds that provided content coverage that was not applicable to any of the respondents. On the enjoyment scale, Threshold 1 of most items provided coverage for respondents located at logits ranging from –2.00 to –3.20, but few respondents were located in this range (i.e. few respondents perceived this low level of enjoyment). Conversely, areas of the scale where respondents were located but where no items by thresholds were located indicate that no items on the enjoyment scale provided content for respondents at these logits on the Rasch scale continuum. On the enjoyment scale, the items by thresholds did not provide coverage for respondents located at logits <–4.00 or >3.00 (i.e. respondents perceiving the lowest or highest levels of enjoyment); however, few respondents were located at these extremes. Overall, because the items by thresholds covered all respondents' locations except those at the extremes, the enjoyment scale provided good content coverage.

The Wright maps for the other scales revealed a slightly different pattern (data not shown but available upon request). For the perceived benefits scale, the items by thresholds provided adequate content coverage for respondents having low to moderate levels of perceived benefits (logits ≤ 1.0) but not for those having moderately-high to high levels of perceived benefits (logits > 1.0). For the perceived internal barriers scale, the items by thresholds provided coverage for respondents across the Rasch scale continuum; however, most respondents had logits between 1.50 and –1.50, indicating the scale included content not needed for the sample. Finally, the Wright map for the perceived time demand barriers scale contained few items, but these items were spaced evenly across the Rasch scale continuum, suggesting that the content matched the distribution of the respondents.

Assessment of response options with item characteristic curves
How well each scale's response options targeted respondents was further assessed through examination of the ICCs. An ICC shows the probability of selecting each response option of an item at each logit along the Rasch scale continuum. Each response option is represented as a curve, and its probability of being selected changes over the Rasch scale continuum. For a scale that targeted all the respondents well, all response option curves should have peaked within a range of –3.00 and 3.00 logits, signifying that each option had a greater probability than all other options of being selected at some point along the Rasch scale continuum. Fig. 2 shows two examples of ICC plots (other ICC plots are available upon request), and the tables in the appendices include conclusions drawn about the scales' response formats from analyses of all ICC plots created.


Figure 2
View larger version (22K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Examples of item characteristic curves.

 
The first example is Item 5 from the perceived internal barriers scale. This item targeted the respondents well because all response options had the highest probability of being selected at some point along the Rasch scale continuum. This finding is evident in the figure where each of the five response option curves peaks (i.e. is higher than all other curves). For Item 5, ‘never’ peaks on the left side of the graph, followed by ‘rarely’, ‘sometimes’, ‘often’ and then ‘very often’ as we move from the left to right in the figure. The figure suggests that respondents perceiving very few internal barriers (e.g. logit of –3.00) were most likely to select ‘never’ as their response, while those perceiving few internal barriers (e.g. logit of –1.50) were most likely to select ‘rarely’.

In contrast, Item 13 from the perceived benefits scale exemplifies an item that targeted respondents poorly. The curves of the first three response options (‘strongly disagree’, ‘somewhat disagree’ and ‘neutral’) never peaked, indicating that all participants, even those perceiving few benefits of physical activity (i.e. at logits –3.00 to –1.00), were likely to select the other options (‘somewhat agree’ and ‘strongly agree’) over the first three options.

For the enjoyment scale, review of the curves indicated that Options 2 and 3 were problematic. Option 2 did not peak for four items and barely peaked (i.e. peaked over a short range of the Rasch scale continuum) for another six items. Option 3 did not peak for nine items and barely peaked for two others. In short, at least one of these options did not peak or barely peaked for each of the 18 items. These results suggest that the seven-point response did not appropriately target respondents. A five- or six-point response format may have targeted them better. For items measuring perceived benefits, the five-point scale appeared to target respondents poorly. A two-point scale would appear to have more appropriately targeted them because Options 3 (‘strongly agree’) and 4 (‘somewhat agree’) were the only ones to consistently peak across the items. On the five-point perceived internal barriers scale, the ‘often’ response did not peak for eight items and barely peaked for two items, indicating the scale might have targeted respondents better if it had been a four-point scale. On the perceived time demand barriers scale, the use of five response options targeted respondents well in two items. A four-point scale might have performed better for the other two items because the ‘never’ option did not peak for one item and the ‘very often’ option did not peak for the other.

Assessment of standard error of measurement
Fig. 3 shows the standard error of measurement at each logit along the Rasch scale continuum. The enjoyment and perceived internal barriers scales each had low measurement error for respondents at each logit (<0.60). Therefore, measurement error was low for respondents perceiving low enjoyment or few internal barriers as well as those perceiving great enjoyment or a high number of internal barriers. The perceived benefits scale had low measurement error for respondents at each logit except for those at logits of 1.18 or greater (a raw score ≥ 53), who were those perceiving a moderately-high to high number of benefits. For the perceived time demand barriers scale, the standard error of measurement was high for respondents at all logits (>0.60). The person separation reliability was 0.93, 0.78, 0.87, 0.80 for the enjoyment, the perceived benefits, the perceived internal barriers and the perceived time barrier scales, respectively.


Figure 3
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Standard error of measurement of the enjoyment, perceived benefits, perceived internal barriers and perceived time demand barriers scales along the Rasch scale continuum.

 

    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
The goal of this study was to use Rasch modeling to assess the psychometric properties of three scales measuring constructs reported in the scientific literature to correlate and possibly mediate physical activity behavior [5, 30]. These scales are the Benefits of Physical Activity Scale [11], the Barriers to Physical Activity Scale [11] and the PACES [12]. The analyses revealed that the overall unidimensional structure of the scales was confirmed for the enjoyment and perceived benefits scales but not for the perceived barriers scale. Our results suggest that the perceived barriers scale had two uncorrelated dimensions, and, therefore, for the Rasch analyses, the perceived barriers scale was treated as two separate scales. For all four scales, the data fit the Rasch partial credit model well. The main analyses suggest that (i) two items, one enjoyment item and one perceived benefit item, were not fitting within the content of their respective scales from a statistical and content perspective; (ii) the content of the perceived benefits scale did not provide adequate content coverage for all respondents; (iii) standard errors of measurement of the perceived benefits and perceived time demand scales were high for some respondents and (iv) the response options of the scales were not targeting the respondents appropriately.

In general, the four scales each represented one strong dimension. Only one enjoyment item and one perceived benefits item did not differentiate respondents based on the underlying construct, suggesting that they were each measuring a minor dimension not tapped by the primary construct measured by the scale. The anchors of the problematic enjoyment item, Item 5, were ‘I am very absorbed in physical activity’ and ‘I am not at all absorbed in physical activity’. Respondents may have perceived these anchors to describe mental states of being that occur during physical activity participation and other item anchors to represent strong negative and positive feelings about physical activity occurring during and after participation in physical activities. Therefore, respondents may have responded to Item 5 differently than they responded to the other items. The problematic perceived benefits item was Item 1, which measured agreement that ‘feeling less depressed and/or bored’ was a benefit of physical activity. For respondents, this item may have represented a temporary benefit of physical activity. They may have perceived that they would feel less depressed or bored temporarily during a bout of physical activity, whereas they may have perceived the benefits described in all other items to be long-lasting changes. Although these items may be important to measure, our results suggest that they should be removed from their respective scales because they do not contribute to the measurement of the constructs assessed with these scales.

Examination of the Wright maps and standard errors of measurement revealed the following: limited content coverage for the perceived benefits scale and high measurement errors for this scale as well as for the perceived time demands scale. On the perceived benefits scale, content was not provided for respondents perceiving that physical activity offers many benefits, and standard errors of measurement were high for these respondents. These results suggest that this scale may not be appropriate to use in populations fitting this description. As a consequence, the scale may be limited in its ability to detect change over time if individuals perceiving a moderate number of benefits are expected to perceive more benefits over time. Adding content with items targeting areas with less coverage would improve the scale. With the time demand barriers scale containing only four items, decreasing the standard errors of measurement may be achieved by increasing the number of items included in this scale. In short, the most important conclusion regarding content coverage and standard errors of measurement is that in their current forms, the perceived benefits and perceived time demand scales are limited in their ability to measure the underlying constructs in some populations, which may explain the difficulty in identifying the roles of these constructs as mediators of physical activity [5, 16, 31, 32].

The major conclusion drawn from the evaluation of the ICCs was that the response options were not targeting all respondents. One option for all scales would be to shorten the number of response options of only those items that did not target appropriately. An advantage of Rasch modeling is that using different ranges of responses across the items does not have the same effect on the total scale score as occurs with CTT (i.e. for CTT, items become weighed differently). A different option would be to make the same changes to the response options for all items of a scale. For example, the response format of enjoyment items may target our respondents better if they were reduced to a five- or six-point scale. Respondents perceiving moderately low enjoyment from physical activity appeared to find Options 2 and 3 not as appealing as other options although these are options that they would be expected to use. Another way to revise the scale would be to give each response option a descriptor to help respondents understand the meaning of each option. In a validation of this scale among youth, the response pattern was shortened to five options [19]. For the perceived benefits scale, respondents perceiving few benefits of physical activity were not most likely to select Option 0 (‘strongly disagree’) or 1 (‘somewhat disagree’) over the other options and those perceiving a moderate number of benefits were not most likely to select Option 2 (neutral) over the other options, even though these are the options they would be expected to use, indicating that the scale was being used as a two-point scale. These respondents may have reacted to what they perceived to be the socially appropriate answer. In our society, there is implicit agreement that physical activity is good, and, therefore, these respondents may have felt that Options 0, 1 and 2 did not reflect the prevailing societal attitudes. To improve the scale, investigators may want to consider revising the item descriptors, the instructions for scale completion or the number of response options to only two. The perceived internal barriers scale also targeted respondents poorly as a five-point scale. Respondents who were expected to select ‘often’ to most items because they perceived a moderately high number of barriers were most likely to select ‘sometimes’ (Option 2) or ‘very often’ (Option 4) instead. This finding suggests that respondents were unable to differentiate between ‘sometimes’ and ‘often’, indicating that the often option be renamed or deleted.

The methods used in the paper to assess the psychometric properties of four scales focused on the scales' structural properties in a sample of healthy, community-living adults. It should be noted that only one aspect of construct validity was assessed, and, consequently, the study does not provide a full validation of the enjoyment, perceived benefits and perceived barriers scales. Given that these scales were administered to a population of inactive adults, the generalizability of our findings is restricted to inactive adults.

Even so, the results add to the evidence supporting the construct validity of the scales [1116]. This study also identified ways to improve the scales and demonstrated the valuable contribution that Rasch modeling can make in evaluating the psychometric properties of commonly used scales. Although it is beyond the scope of this paper to compare Rasch and CTT results, the Rasch results do not typically contradict CTT results when the scale scores are normally distributed, as were the scales used in this study. However, as demonstrated in this paper, Rasch modeling provides more in-depth information about the psychometric properties than is available with CTT. In summary, our findings indicate the enjoyment, perceived benefits and perceived barriers scales hold promise for further use in physical activity studies, but their effectiveness at detecting differences among individuals and changes over time may be limited unless changes in scale content and response format are made.


    Conflict of interest statement
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
None declared.


Appendix A Summary of results for the 18-item, seven-point enjoyment scale

Item Item anchors Factor loadings Item difficulty Item fit indices: MNSQa (t-value) Other findings

1 I enjoy it versus hate itb 0.78 –0.64 0.90 (–1.4)
2 I feel bored versus interested 0.61 –0.36 1.31 (4.0)
3 I dislike it versus like it 0.75 –0.50 0.85 (–2.1) Targeted respondents as a six-point scale.
4 I find it pleasurable versus unpleasurableb 0.68 –0.25 1.14 (1.9)
5 I am very absorbed versus not at all absorbed in physical activityb 0.34 1.05 2.57 (14.1) Item misfit and response Option 6 misfit (MNSQ = 1.37, t = 2.1) suggest item may be measuring a different dimension of enjoyment than other items.
Targeted respondents as a six-point scale.
6 It's no fun at all versus a lot of fun 0.73 –0.35 0.90 (–1.4) Targeted respondents as a six-point scale.
7 I find it energizing versus tiringb 0.68 –0.58 1.22 (2.7)
8 It makes me depressed versus happy 0.63 –0.83 1.23 (2.5) Misfit of response Option 1 (MNSQ = 1.75, t = 2.4). Targeted respondents as a five- point scale.
9 It's very pleasant versus very unpleasantb 0.85 –0.46 0.73 (–4.0) Item misfit and response Option 7 misfit (MNSQ = 0.62, t = –3.3) suggest content is redundant. Targeted respondents as a six-point scale.
10 I feel good versus bad physically while doing itb 0.72 –0.77 1.12 (1.6)
11 It's very invigorating versus not at all invigoratingb 0.80 –0.90 0.90 (–1.3) Targeted respondents as a six-point scale.
12 I am very frustrated versus I am not at all frustrated by it 0.64 –0.50 1.28 (3.4) Targeted respondents as a five-point scale.
13 It's very gratifying versus not at all gratifyingb 0.86 –0.86 0.72 (–4.0) Item misfit and response Option 7 misfit (MNSQ = 0.71, t = –2.9) suggest content is redundant.
14 It's very exhilarating versus not at all exhilaratingb 0.84 –0.57 0.82 (–2.5) Misfit of response Option 7 (MNSQ = 0.71, t = –2.5). Targeted respondents as a six-point scale.
15 It's not at all stimulating versus very stimulating 0.73 –0.83 1.02 (0.3)
16 It does versus does not give me a strong sense of accomplishmentb 0.64 –0.95 1.36 (4.1) Item misfit but no response option misfit.
Targeted respondents as a six-point scale.
17 It's very refreshing versus not at all refreshingb 0.82 –0.74 0.87 (–1.7) Targeted respondents as a six-point scale.
18 I felt as though I would rather be doing something else versus there was nothing else I would rather be doing 0.69 0.07 1.10 (1.4) Targeted respondents as a six-point scale.
Total enjoyment scale results and suggestions

Results: The scale was unidimensional and provided adequate content coverage across the Rasch scale continuum. The seven-point response option format targeted respondents poorly. The scale had low standard errors of measurement across the Rasch continuum. Item 5 did not fit well, and Items 9 and 13 contained redundant content.

Suggestion: Reduce response options to five or six options and provide descriptors with each response option. Remove Items 5 and either 9 or 13.

a Weighted mean square fit statistics of item difficulty parameters are presented. Bolded values represent misfit.

b Reverse coded for analyses.


Appendix B Summary of results for the 14-item, five-point perceived benefits scale

Item Item descriptions Factor loadings Item difficulty Item fit indices: MNSQa (t-value) Other findings

1 Feel less depressed and/or bored 0.35 –1.08 1.61 (6.1) Item misfit and response Option 1 misfit (MNSQ = 1.59, t = 3.3) suggest item may be measuring a different dimension of perceived benefits than other items. Targeted respondents as a four-point scale.
2 Improve self-esteem 0.47 –2.09 1.14 (1.4) Targeted respondents as a four-point scale.
3 Meet new people 0.33 –1.24 1.53 (5.9) Item misfit but no response option misfit.
Targeted respondents as a four-point scale.
4 Lose weight 0.50 –2.35 1.12 (1.2) Targeted respondents as a four-point scale.
5 Build up muscle strength 0.67 –2.54 0.85 (–1.5) Targeted respondents as a two-point scale.
6 Feel less tension and stress 0.57 –2.42 0.94 (–0.7) Targeted respondents as a three-point scale.
7 Improve health or reduce risk of disease 0.68 –2.95 0.85 (–1.7) Targeted respondents as a two-point scale.
8 Do better on my job 0.56 –1.64 1.07 (0.9) Targeted respondents as a three-point scale.
9 Feel more attractive 0.68 –2.17 0.90 (–1.1) Targeted respondents as a three-point scale.
10 Improve heart and lung fitness 0.71 –3.36 0.86 (–1.6) Targeted respondents as a two-point scale.
11 Gain muscle 0.71 –2.39 0.88 (–1.4) Targeted respondents as a three-point scale.
12 Improve muscle tone 0.75 –2.90 0.79 (–2.0) Targeted respondents as a two-point scale.
13 Feel better about my body 0.75 –2.74 0.77 (–2.1) Targeted respondents as a two-point scale.
14 Increase energy level 0.75 –1.94 0.74 (–3.7) Item misfit and Option 7 misfit (MNSQ = 0.72, t = –5.5), but content does not appear to be redundant with other content. Targeted respondents as a two-point scale.
Total perceived benefits scale results and suggestions

Results: The scale is unidimensional. It did not contain content for respondents perceiving many benefits of physical activity. The five-point response option format targeted respondents poorly. High standard errors of measurement were found for respondents perceiving many benefits. Item 1 did not fit.

Suggestions: Add items providing content coverage for individuals expected to score high on the scale (i.e. those perceiving many benefits). Reduce response options to two options, revise descriptors of each response option or revise instructions for scale completion. Remove Item 1.

a Weighted mean square fit statistics of item difficulty parameters are presented. Bolded values represent misfit.


Appendix C Summary of results for the 24-item, five-point perceived barriers scale

Item Item descriptions Factor loadingsa Item difficulty Item fit indices: MNSQb (t-value) Other findings

Internal barriers

1 Self conscious about my looks 0.52 0.70 0.96 (–0.6) Targeted respondents as a four-point scale.
2 Lack interest in physical activity 0.59 0.01 1.04 (0.6) Targeted respondents as a three-point scale.
3 Lack self-discipline or willpower 0.34 –0.86 1.22 (2.8)
5 Lack energy 0.60 –0.43 0.97 (–0.5)
6 No one to do physical activities with me 0.39 –0.02 1.18 (2.7)
7 Do not enjoy physical activity 0.68 0.40 0.96 (–0.5)
8 Hate to fail, so I do not try 0.55 1.27 0.89 (–1.5)
9 Lack equipment 0.35 0.79 1.14 (1.8) Targeted respondents as a four-point scale.
10 The weather is too bad 0.40 0.94 1.06 (0.9)
11 Lack skills 0.63 0.83 0.85 (–2.0) Targeted respondents as a four-point scale.
12 Too tired to exercise 0.58 –0.13 0.99 (–0.1)
13 Lack knowledge on how to do physical activities 0.57 0.80 0.93 (–1.1)
14 Poor health 0.43 1.40 0.96 (–0.4) Targeted respondents as a four-point scale.
15 Fear injury 0.43 1.52 0.94 (–0.6) Targeted respondents as a four-point scale.
16 Physical activity is hard work 0.67 0.68 0.88 (–1.8)
17 Lack a convenient place to do physical activity 0.45 0.70 1.06 (0.9)
18 Too overweight 0.54 0.70 0.92 (–1.1)
19 Physical activity is boring 0.62 0.33 0.98 (–0.3) Targeted respondents as a four-point scale.
20 Minor aches and pains 0.39 0.80 1.07 (1.0) Targeted respondents as a four-point scale.
24 Lack money 0.38 0.83 1.07 (0.9) Targeted respondents as a four-point scale.
Total internal barriers scale: results and suggestions

Results: The scale is unidimensional. The five-point response option format targeted respondents poorly. The scale had low standard errors of measurement across the logit continuum, but the total scale score has a high standard deviation.

Suggestion: Add more items to address the high standard deviation. Rename or delete the often response option.

Time demand barriers

4 Lack time 0.79 –0.76 1.00 (–0.0) Targeted respondents as a four-point scale.
21 Work demands 0.86 –0.03 0.90 (–1.5)
22 Social demands 0.59 0.48 1.05 (0.8) Targeted respondents as a four-point scale.
23 Family demands 0.71 1.27 1.19 (2.5)
Total time demand barriers scale

Results: The scale was unidimensional. Two items targeted respondents as a five-point scale, and two did not. There was no clear indication of which response options should be revised. The scale had high standard errors of measurement across the Rasch scale continuum, and, therefore, provided limited content coverage across the continuum.

Suggestion: Increase the number of items and validate the content of the new items before further analysis of construct validity.

a The eigenvalue plot and factor loadings indicated the barriers scale as a whole was not unidimensional even though the one-factor solution explained 23% of the variance. An uncorrelated two-factor solution best represented the items. The two barriers scales were unidimensional. A one-factor solution explained 27% of the variance for the internal barriers scale and 55% of the variance for the time demand barriers scale. Loadings of items onto their respective scale are presented here.

b Weighted mean square fit statistics of item difficulty parameters are presented. No items were misfitting. Weighted fit statistics of item by response category parameters (data not shown) also revealed no misfitting item.


    Footnotes
 
5 Present address: Centre for Community Child Health Research, The University of British Columbia, Vancouver, BC V5H 3V4, Canada Back

6 Present address: Klein Buendel, Inc., Golden, CO 80401, USA Back


    Acknowledgements
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
This research was supported in part by National Cancer Institute Grant R25 CA57712. Project PRIME was supported by National Heart, Lung, and Blood Institute Grant HL58608. Special thanks to the Project PRIME participants for their contributions to the study and to The Cooper Institute staff for their work in collecting the data.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conflict of interest statement
 Acknowledgements
 References
 
1. Bock BC, Marcus BH, Pinto BM, et al. Maintenance of physical activity following an individualized motivationally tailored intervention. Ann Behav Med 2001 23:79–87.[CrossRef][ISI][Medline]

2. Dunn AL, Marcus BH, Kampert JB, et al. Reduction in cardiovascular disease risk factors: 6-month results from Project Active. Prev Med 1997 26:883–92.[CrossRef][ISI][Medline]

3. Marshall AL, Baumann AE, Owen N, et al. Population-based randomized controlled trial of a stage-targeted physical activity intervention. Ann Behav Med 2003 25:194–202.[CrossRef][ISI][Medline]

4. Pinto B, Friedman R, Marcus BH, et al. Effects of a computer-based, telephone-counseling system on physical activity. Am J Prev Med 2002 23:113–20.[CrossRef][ISI][Medline]

5. Lewis BA, Marcus BH, Pate RR, et al. Psychosocial mediators of physical activity behavior among adults and children. Am J Prev Med 2002 23:26–35.[CrossRef][ISI][Medline]

6. Lord FM. Applications of Item Response Theory to Practical Testing Problems.Hillsdale, NJ: L. Erlbaum Associates 1980.

7. Lord FM and Novick MR. Statistical Theories of Mental Test Scores.Reading, MA: Addison-Wesley 1968.

8. Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests.Chicago, IL: University of Chicago Press 1960.

9. Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests.Chicago, IL: University of Chicago Press 1980.

10. Wilson M, Allen D, Corser Li J. Improving measurement in health education and health behavior research using item response modeling: comparison with the classical test theory approach. Health Educ Res 2006 21:Suppl 1, i19–i32.[Abstract/Free Full Text]

11. Sallis JF, Hovell MF, Hofstetter CR, et al. A multivariate study of determinants of vigorous exercise in a community sample. Prev Med 1989 18:20–34.[CrossRef][ISI][Medline]

12. Kendzierski D and Physical Activity DeCarloKJ. Enjoyment Scale: two validation studies. J Sport Exerc Psychol 1991 13:50–63.

13. Felton GM, Ott A, Jeter C. Physical activity stages of change in African American women: implications for nurse practitioners. Nurse Pract Forum 2000 11:116–23.[ISI][Medline]

14. Robbins LB, Pis MB, Pender NJ, et al. Exercise self-efficacy, enjoyment, and feeling states among adolescents. West J Nurs Res 2004 26:699–715.[Abstract]

15. Rovniak LS, Anderson ES, Winett RA, et al. Social cognitive determinants of physical activity in young adults: a prospective structural equation analysis. Ann Behav Med 2002 24:149–56.[CrossRef][ISI][Medline]

16. Sallis JF, Calfas KJ, Alcaraz JE, et al. Potential mediators of change in a physical activity promotion course for university students: Project GRAD. Ann Behav Med 1999 21:149–58.[ISI][Medline]

17. Heesch KC, Mâsse LC, Dunn AL, et al. Does adherence to a lifestyle physical activity intervention predict changes in physical activity? J Behav Med 2003 26:333–48.[CrossRef][ISI][Medline]

18. Crocker PRE, Bouffard M, Gessaroli ME. Measuring enjoyment in youth sport settings: a confirmatory factor analysis of the Physical Activity Enjoyment Scale. J Sport Exerc Psychol 1995 17:200–5.

19. Motl RW, Dishman RK, Saunders R, et al. Measuring enjoyment of physical activity in adolescent girls. Am J Prev Med 2001 21:110–7.[CrossRef][ISI][Medline]

20. Calfas KJ, Sallis JF, Lovato CY, et al. Physical activity and its determinants before and after college graduation. Med Exerc Nutr Health 1994 3:323–34.

21. Sallis JF, Hovell MF, Hofstetter CR, et al. Explanation of vigorous physical activity during two years using social learning variables. Soc Sci Med 1992 34:25–32.[CrossRef][ISI][Medline]

22. Nunnally JC and Bernstein IH. Psychometric Theory.New York: McGraw-Hill 1994.

23. Reeve BB and Mâsse LC. Item response theory modeling for questionnaire evaluation. In Presser S, Rothgeb JM, Couper MP, Lessler JT, Martin E, Martin J, Singer E (Eds.). Methods for Testing and Evaluating Survey Questionnaires.Hoboken, NJ: John Wiley & Sons 2004 pp. 247–73.

24. Wu ML, Adams RJ, Wilson MR. ACER Conquest: Generalized Item Response Modeling Software.Melbourne, Australia: The Australian Council for Education Research 1998.

25. Andrich DA. A rating formulation for ordered response categories. Psychometrika 1978 43:561–73.[CrossRef][ISI]

26. Masters GN. A Rasch model for partial credit scoring. Psychometrika 1982 49:359–81.[Medline]

27. Wright BD and Masters GN. Rating Scale Analysis.Chicago, IL: Mesa Press 1982.

28. Adams RJ and Khoo ST. Quest.Melbourne, Australia: The Australian Council for Education Research 1991.

29. Wilson M. Constructing Measures: An Item Response Modeling Approach.Mahwah, NJ: Erlbaum 2005.

30. Sallis JF and Owen N. Physical Activity and Behavioral Medicine.Thousand Oaks, CA: Sage 1999.

31. Castro CM, Sallis JF, Hickmann SA, et al. A prospective study of psychosocial correlates of physical activity for ethnic minority women. Psychol Health 1999 14:277–93.

32. Nichols JF, Wellman E, Caparosa S, et al. Impact of a worksite behavioral skills intervention. Am J Health Promot 2000 14:218–21.[ISI][Medline]

Received on September 20, 2005; accepted on May 1, 2006


Add to CiteULike CiteULike   Add to Connotea Connotea