Health Education Research Advance Access originally published online on October 11, 2006
Health Education Research 2006 21(Supplement 1):i33-i46; doi:10.1093/her/cyl106
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Author(s).
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Evaluating the properties of a stage-specific self-efficacy scale for physical activity using classical test theory, confirmatory factor analysis and item response modeling
1 Centre for Community Child Health Research, Department of Pediatrics, University of British Columbia, L408-4480 Oak Street, Vancouver, British Columbia V6H 3V4, Canada
2 School of Human Movement Studies, University of Queensland, Brisbane, Queensland 4972, Australia
3 Center for Health Promotion and Prevention Research, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
4 Graduate School of Education, University of California Berkeley, Berkeley, CA 94720, USA
* Correspondence to: L. C. Mâsse. E-mail: lmasse{at}cw.bc.ca
| Abstract |
|---|
|
|
|---|
The purpose of this paper was to evaluate the psychometric properties of a stage-specific self-efficacy scale for physical activity with classical test theory (CTT), confirmatory factor analysis (CFA) and item response modeling (IRM). Women who enrolled in the Women On The Move study completed a 20-item stage-specific self-efficacy scale developed for this study [n = 226, 51.1% African-American and 48.9% Hispanic women, mean age = 49.2 (±7.0) years, mean body mass index = 29.7 (±6.4)]. Three analyses were conducted: (i) a CTT item analysis, (ii) a CFA to validate the factor structure and (iii) an IRM analysis. The CTT item analysis and the CFA results showed that the scale had high internal consistency (ranging from 0.76 to 0.93) and a strong factor structure. Results also showed that the scale could be improved by modifying or eliminating some of the existing items without significantly altering the content of the scale. The IRM results also showed that the scale had few items that targeted high self-efficacy and the stage-specific assumption underlying the scale was rejected. In addition, the IRM analyses found that the five-point response format functioned more like a four-point response format. Overall, employing multiple methods to assess the psychometric properties of the stage-specific self-efficacy scale demonstrated the complimentary nature of these methods and it highlighted the strengths and weaknesses of this scale.
Bandura [1] describes self-efficacy as the belief or the confidence individuals have in their skills and abilities to perform a behavior necessary to reach a desired goal or achieve an expected outcome. Self-efficacy is not meant to be a measure of skills but of the belief individuals have in what they can do with the skills they possess. Given its association with behavior, self-efficacy has been integrated into a number of behavior change theories and models, including social learning theory [1], the transtheoretical model [2], the health belief model [3] and the theory of planned behavior, which includes a related construct [4]. The association between physical activity and self-efficacy has been studied widely [5], along with its ability to predict change in physical activity behavior [615].
To date, most self-efficacy scales in the physical activity literature have been developed to measure confidence in one's ability to exercise or be physically active when faced with barriers to being active [1619]. Given the evidence that suggests a relationship between self-efficacy and the transtheoretical model's stages of change [1619], creating a stage-specific self-efficacy scale may provide an opportunity to match the content of the scale with the content of a stage-matched intervention (i.e. matched to respondents' stages of change). Therefore, the purpose of this paper was to demonstrate the usefulness of using classical test theory (CTT), confirmatory factor analysis (CFA) and item response modeling (IRM) to evaluate the psychometric properties of a newly developed scale that measures stage-specific self-efficacy.
Although it has been almost two decades since IRM was first introduced to the field of physical activity [20], IRM is seldom employed to assess the psychometric properties of a scale in the area of physical activity [21]. It is beyond the scope of this paper to provide a full introduction to IRM, as a number of comprehensive presentations are provided elsewhere [2225] as well as presentations specific to the area of physical activity [20, 2630].
| Methods |
|---|
|
|
|---|
Respondents
Data for this analysis were obtained from the Women On The Move (WOTM) study, a 5-year project funded by the Women's Health Initiative through the Centers for Disease Control and Prevention (CDC) aimed at developing and validating physical activity surveys for minority women. African-American and Hispanic women aged 4070 years residing in Houston, TX, USA, were recruited for the WOTM study. To participate in the study, women had to meet the following criteria: (i) no health limitations that prevented them from being physically active, (ii) not pregnant or planning to become pregnant during the study and (iii) no plans to move out of the geographical area within the next year. Women were recruited to participate in the study through the media (print, television and radio), community presentations and posting of flyers. After recruitment, a total of 656 women (311 African-American and 340 Hispanic) expressed interest in the study. Of those, 590 women were screened by telephone to assess their eligibility, 386 were then screened in person and 260 women (130 African-American and 130 Hispanic) were enrolled in the study.
Protocol
The human subjects committees at the University of TexasHouston Health Science Centerand the CDC approved the WOTM protocol. Those who met all study eligibility criteria (assessed by telephone and at the in-person screening) completed an intensive 3-week observational protocol and had a 6-week follow-up assessment (protocol described in [31]). At the follow-up assessment, the respondents completed a physical activity questionnaire and the Correlates of Physical Activity Questionnaire (CPAQ). Two-thirds of the respondents (62%) were asked to participate in a reliability study as part of the 6-week follow-up. Those who agreed completed the CPAQ and a physical activity questionnaire during an in-person meeting. The remaining respondents were asked to complete the questionnaires by mail. In both cases, the CPAQ was self-administered and respondents received the same battery of instruments. The CPAQ assessed beliefs, normative modeling, perceived barriers, outcome expectations, stage-specific self-efficacy and stages of motivational readiness to change physical activity behavior. The stage-specific self-efficacy data collected at the 6-week follow-up were analyzed for this paper. Those who completed the 6-week follow-up received a $20 cash incentive. Anthropometric and demographic information obtained at the time of enrollment in the WOTM study are reported in this paper.
Instruments
Demographic items
Demographic information was collected at the in-person screening. Respondents completed an eight-item demographic questionnaire and provided their age, race/ethnicity, primary language, occupation, household income, highest educational level obtained and any health conditions that would prevent them from being physically active.
Self-efficacy measure
A 20-item self-efficacy questionnaire was developed that included four sub-scales designed to assess stage-specific self-efficacy (i.e. self-efficacy in moving from contemplation, preparation, action/maintenance and moving out of relapse). As the overall scale assessed self-efficacy in becoming active, no pre-contemplation sub-scale was included because it was assumed that respondents had to at least be thinking about being physically active before it would be appropriate to measure their self-efficacy in becoming active. In addition, a relapse sub-scale was added to assess confidence in resuming an active lifestyle after a person had stopped being active. In developing the scale, it was assumed that the sub-scales would be correlated and would measure overall self-efficacy. Furthermore, it was assumed that the sub-scales would be differentiated by stages on the self-efficacy continuum, where the contemplation items would require lower self-efficacy to overcome the behaviors targeted by these items than the preparation and action/maintenance items. Items were generated by reviewing other self-efficacy scales and by matching the content with the operational definition for each sub-scale [32].
The sub-scale addressing the contemplation stage measured the confidence in committing to being physically active which included four statements that started with the following stem: Assume that you are currently thinking about being physically active. These statements assessed confidence in being physically active, being physically active on a regular basis, making a commitment to being physically active on a regular basis and starting to be physically active in the next few weeks. The sub-scale addressing the preparation stage measured the confidence in developing a plan of action to be physically active and included seven statements that started with the following stem: Assume that you have decided to be physically active on a regular basis. These statements assessed confidence in finding an enjoyable physical activity program, finding a convenient and safe place to be physically active, finding time to be physically active, being able to schedule physical activity, finding a partner for physical activity and finding ways to be physically active through bad weather. The action/maintenance sub-scale addressed the confidence in preventing relapse and had statements beginning with Assume that you have been physically active on a regular basis for 3 months. Four statements followed to assess confidence in maintaining physical activity when family and work responsibilities are more demanding than usual, during the holidays and over the next 3 months. A fifth statement assessed confidence in keeping physical activity enjoyable. The relapsed sub-scale measured the confidence in dealing with relapse and resuming to a physical activity program once the person had relapsed and included statements related to the potential to move from the action or maintenance stage back to an earlier stage. This sub-scale began with the following instruction: Assume that you were physically active on a regular basis but in the last 3 weeks you stopped being active. Four statements followed to assess confidence in beginning a physical activity program again, committing to being physically active again, setting a regular physical activity routine and feeling comfortable about being physically active again. A five-point scale was used as the response format for all items (1 = extremely confident, 2 = very confident, 3 = rather confident, 4 = somewhat confident, and 5 = not confident at all).
Stages of physical activity measure
Five items for stages of exercise of Marcus et al. [17] modified for physical activity behaviors served to assess stages of physical activity. The scale measured the following stages: pre-contemplation, contemplation, preparation, action and maintenance. The action and maintenance were defined as doing moderate physical activity 30 min day1 for at least 5 days of the week, and they were in maintenance if the respondents were active at this level for at least 3 months.
Analyses
CTT item analysis
The SPSS (SPSS Windows version 9.0) reliability subroutine was used to conduct a CTT item analysis that consisted of computing the item-total correlation and Cronbach's coefficient alpha (reliability index), which is a measure of internal consistency. The item-total correlation evaluated the extent to which each item in the test discriminated. Any item-total correlation of <0.30 [33] indicated that the item discriminated poorly. Using lower bound criteria for reliability of Nunnally and Bernstein [33], a Cronbach's coefficient alpha with a value of at least 0.70 was considered adequate.
Confirmatory factor analysis
CFA, using LISREL software [34], served to validate the hierarchical factor structure of the stage-specific self-efficacy scale. Given that the data were not skewed or kurtosed severely, parameter estimates were obtained using the maximum likelihood estimation procedure. As there is no standard for determining model fit, the criteria of Hu and Bentler [35] for evaluating model fit were followed. The
2 goodness-of-fit test served to determine the overall fit of the model, with a P-value >0.15 indicating that the residuals no longer were significanthence a good fit. Given that the
2 goodness-of-fit test is affected by sample size and the distributional properties of the items, other indices of fit also were evaluated. Steiger's root mean square error of approximation (RMSEA) was examined, with a value
0.05 indicating a good fit and an upper value of 0.08 representing a reasonable fit. Both the comparative fit index (CFI) and the non-normed fit index (NNFI) were examined. These indices compare the fit of the model to a baseline model with values bounded between zero and one. For both indices, a value >0.95 indicates a good fit [35].
IRM analyses
The ConQuest software [36], using the Rasch's family of logistic models, was employed for its ability to model the ordinal response of the data and to take into account the size of the data available for the analysis. The first step in the analysis was to determine the appropriate model that fitted the data. Both the partial credit [37] and the rating scale [38] models were fitted to the data. The partial credit model does not assume that the distances between ordinal responses are the same for all items (e.g. the distance between Response Options 1 and 2, 2 and 3, etc., is not the same across all items). In contrast, the rating scale model assumes that the distances between ordinal responses are the same for all items (e.g. the distance between Response Options 1 and 2, 2 and 3, etc., is the same across all items). The best-fitting model was assessed by comparing the deviance parameters as well as the weighted fit indices for the items and respondents (i.e. infit statistics). The evaluation criteria of Adams and Khoo [39] were used, where a weighted mean square value <0.75 and >1.33 served to identify misfit. For the weighted t statistic, any value <2.00 or >2.00 was indicative of misfit. Items or respondents that had both unacceptable values for the weighted mean square statistic and t statistic were flagged as having high residuals.
The IRM analyses served to (i) estimate the location of the items on the self-efficacy continuum and scale the respondents' ability on the same metric as the items, (ii) assess content representation by evaluating the item location on the self-efficacy continuum and verify if the item locations are differentiated by stages, (iii) evaluate the functioning of the five-point response scale, (iv) estimate the standard error of measurement, and (v) estimate the reliability of the scale. Visual inspection of the option characteristic curves (i.e. plots showing the probability of selecting a given response option as a function of self-efficacy) served to evaluate the functioning of the five-point response format. Such evaluation determined, for example, that the five-point response format functioned in practice more like a four-point response format. An item-respondent map served to evaluate the location of the items and to assess if the items and respondents were appropriately targeted. This map served to qualitatively assess content representation by determining if the item assessed low, moderate or high levels of self-efficacy. Finally, similar to the CTT item analysis, person standard error of measurement and reliability were assessed. Unlike a CTT item analysis, however, the standard error of measurement and reliability (derived from the information function) of the scale are a function of self-efficacy. By assuming that the respondents' ability is normally distributed, the conditional reliability was estimated as follow: 1/(1+ (conditional standard error)2). Scale precision were assessed by estimating 95% confidence intervals. Any section of the reliability function that is <0.70 served to identify areas where the stage-specific self-efficacy scale lacked adequate reliability.
| Results |
|---|
|
|
|---|
Respondents
Of the 260 women who enrolled in the study, 246 (121 African-American and 125 Hispanic) completed the self-efficacy scale and 226 women (51.1% African-American and 48.9% Hispanic women) had available data for the analyses. Demographic characteristics of the study sample are summarized in Table I. The mean age of the sample was 49.2 (±7.0) years, the mean weight was 76.3 (±17.2) kg and the mean body mass index was 29.7 (±6.4). Many of the respondents (43.3%) had an income of <$25 000 per year, and 33.3% were college graduates. The women were classified into one of five stages of physical activity as follow: 2.8% were in pre-contemplation, 17.1% were in contemplation, 33.6% were in preparation, 8.3% were in action and 38.2% were in maintenance.
|
Classical test theory
Means and standard deviations for each item and sub-scale and total scale scores for the self-efficacy items are presented in Table II. Along with this information, the corrected item-total correlations (CITCs) for each item for the total scale are presented. All items had adequate discrimination (i.e. all CITCs were >0.30). All CITCs were >0.65, except for one item (i.e. Item 10, find a partner; CITC = 0.34). The Cronbach's alpha values for the individual sub-scales as well as the overall scale were well above the accepted 0.70 recommended cutoff [33]. Cronbach's alpha values for the sub-scales ranged from 0.90 to 0.95, whereas the Cronbach's alpha for the 20-item scale was 0.96.
|
Confirmatory factor analysis
Results indicated that a second-order factor analysis model that allowed four error terms to be correlated provided an adequate fit [
2 goodness-of-fit test (df = 162) = 327.78, P < 0.05; RMSEA = 0.07 with 90% confidence interval = 0.060.08; NNFI = 0.96 and CFI = 0.96). Allowing these error terms to be correlated provided an indication that these items shared a common variance that was not accounted for by the hypothesized factor structure. Results of the second-order factor analysis are presented in Fig. 1. All first-order factor loadings, representing the relationship among the items and the sub-scales, were high; except for Item 10 that asked about self-efficacy in finding a partner to be active with you. This weaker association suggests that the item is not as strongly related to the underlying construct as the other items. In contrast, all second-order factor loadings were high (ranging from 0.83 to 0.94), thus supporting the hypothesis that the self-efficacy scale has a global factor composed of four sub-scales. Finally, the four correlations among error terms (added to improve the overall fit of the model) ranged from 0.18 to 0.41 and indicated that these items may tap similar content.
|
To confirm the unidimensionality assumption, needed for the IRM analysis, a principal component analysis was conducted to determine how the 20 items loaded on a common self-efficacy scale. This methodology assessed if a main dimension existed but it did not preclude the presence of minor dimensions [40]. The factor loadings of the principal component analysis are presented in Table II (see last column). All loadings were high (>0.70) for all items, except Item 10 which had a loading of 0.36, indicating the items measured a single dominant factor. This factor explained 58% of the total variance. Note that a second factor would explain only 7% of the total variance, which further indicates that the scale has only one dominant factor.
Item response modeling
The magnitude of the deviance was lower for the partial credit model (deviance = 10 950, df = 82) than for the rating scale model (deviance = 11 073, df = 25). The difference in the deviance was [123 (11 073 10 950)] with 57 [82 25] degrees of freedom, which is statistically significant at an alpha of 0.05 using a
2 distribution), suggesting that the partial credit model fits the self-efficacy scale significantly better. Examination of the weighted fit indices (i.e. infit statistics) for the items, responses and items by response categories confirmed that the partial credit model fits the data better (see Table III). For the rating scale model, 25% of the items and all response categories had both the weighted mean square statistic and t statistic outside of the acceptable range, assuming equal distances across the responses resulted in a poor fit. In contrast, for the partial credit model, only five out of 100 items by response categories (Items 3, 8, 10, 11 and 18) had both the weighted mean square statistic and t statistic outside of the acceptable range and this was observed mostly for Response Option 5. Finally, examination of the respondent weighted fit indices further confirmed that the partial credit model had a better fit. The percentage of respondent misfit with the partial credit model was 28 versus 31% for the rating scale model (again misfit was defined as having both the weighted mean square fit statistic and t statistic outside acceptable range). Finding that the partial credit model fits the data better indicated that the distances between response options were not the same across all items of the self-efficacy scale (e.g. the distance between very confident rather confident for Item 1 and is not equal to the distance between very confident and rather confident for Item 2). Further evaluation of the item fit indices revealed that Item 10 finding a partner to be active with you had the largest weighted mean square statistic and t statistic, indicating that this item may have been a poor measure of self-efficacy or that it measured something else. In contrast to Item 10, the other four items flagged for having a poor fit had a weighted mean square value <0.75, which is less problematic than having a value >1.33 as the latter is indicative that the item may contribute less to the overall construct [41].
|
Figure 2 presents the item-respondent map for the self-efficacy scale. In this map, the items and respondents are on the same metric using a logit scale which is centered at a mean of zero. The items and respondents are most often located between 3 and +3 logits. Items located at zero would measure moderate-level self-efficacy and respondents at the same level would have moderate levels of self-efficacy. Items with positive logits would measure higher level of self-efficacy and items with negative logit would measure lower levels of self-efficacy. The location of the items on the self-efficacy continuum serves to evaluate content representation along the continuum of self-efficacy. In the Rasch analysis, the raw scores are converted to a logit scale in the estimation process. By placing both the items and respondents on the same logit scale, the ordering of the data is preserved but the scale has interval properties. As shown in Fig. 2, the respondents' distribution is rotated at a 90° angle and shows those who have high self-efficacy at the top of self-efficacy continuum and those who have lower self-efficacy at the bottom. The respondents' distribution is followed by the location of the items on the self-efficacy continuum and the location of the items by thresholds (the five-point response format are separated by four threshold points, where Threshold 1 refers to the threshold between Response Option 1 not confident at all and Response Option 2 somewhat confident). Given that the partial credit model fitted the data, it is most appropriate to compare the respondents' distribution with the items by thresholds. Comparing the respondents' distribution with the items by thresholds revealed that the scale may be misaligned (see Fig. 2). There appears to be 15 items (19 and 1520) with thresholds of one that are targeting <1% of the respondents. As shown in Fig. 2, approximately two respondents have self-efficacy <3.00 but a significant number of items with threshold of one are targeting these respondents. The opposite problem is observed with high levels of self-efficacy. Few items with a threshold of four are targeting respondents with self-efficacy of 1.75 and above, representing
15% of the respondents. As shown in Fig. 2, most of the items by thresholds assessed low or moderate self-efficacy, and fewer measured high self-efficacy. Such a restriction in the distribution of items by thresholds indicates that the scale had skewed content representation of the construct. In addition to having the items by threshold misaligned with the respondents, Fig. 2 showed that for all of the items, Threshold 1 targeted a self-efficacy of 2.50 and less. Given that <10 respondents had self-efficacy of 2.50 or lower, this suggested that Threshold 1 can be eliminated since it was poorly targeted. Threshold 1 represents the threshold between Response Option 1 not confident at all and Response Option 2 somewhat confident; therefore, the results suggested that the not confident at all response option was not chosen by those who had low self-efficacy in becoming physically active.
|
Further examination of the item locations revealed that items addressing moderate to high levels of self-efficacy seemed to focus on issues that were not within the individual's control and thus were more difficult to overcome (e.g. Item 10, being able to find an exercise partner; Item 12, remain active when family responsibility increases; Item 13, remain active when work responsibility increases, and Item 14, remain active during holidays). Furthermore, evaluation of the item location served to assess that one of the underlying assumption of this scale was that the location of the items on the self-efficacy continuum would be differentiated by stages, meaning that the contemplation items would require lower self-efficacy to overcome, followed by the preparation and action/maintenance items. In addition, the relapse items would more than likely overlap with the contemplation and preparation items. Evaluation of the item location revealed that the contemplation and relapse items (Items 14 and 1720) required less self-efficacy to overcome than the action/maintenance items (Items 1216). The item locations for the preparation items (511) overlapped with the other three sub-scales, suggesting that item locations on the self-efficacy continuum were not clearly differentiated by stages clearly.
The functioning of the five-point response scale was assessed by examining the option characteristic curves for each item. Figures 3 and 4 show the option characteristics curves for Items 3 and 11. Fifteen of the 20 items had option characteristic curves similar to Item 3, as shown in Fig. 3. The pattern of responses for these items indicated that the not confident at all response option does not have the highest probability of being selected along the self-efficacy continuum for respondents >3.00 logits; even respondents with high frequency of choosing the lowest levels of self-efficacy did not choose this response. In other words, this pattern of response suggests that these items functioned like a four-point rather than a five-point response format. The remaining items had option characteristic curves that were similar to Item 11, as shown in Fig. 4. Given that all responses at some point along the continuum of self-efficacy had the highest probability of being selected, it appears that for these five items the five-point response format functioned well. As suggested by Zhu et al. [42], a post hoc analysis was conducted to confirm that collapsing the not confident at all response option for 15 of the items improved the fit of the data. Given that this change affected the extreme lower part of the distribution (targeting <1% of the respondents), it had a minimal impact on the data (data not shown).
|
|
Figures 5 and 6 show the standard errors of measurement and the conditional reliability, respectively, for the items as a function of self-efficacy. As expected, the standard errors of measurement are inversely related to the reliability, with higher measurement errors associated with lower reliability. High levels of self-efficacy were measured less precisely. The 95% confidence interval range for respondents who have a 2.00 on the self-efficacy scale is 1.43 to 2.57, whereas the range for a self-efficacy of 2.00 is 1.18 to 2.82. As shown in Fig. 4, the conditional reliability of the test ranged from 0.76 to 0.93, with the highest conditional reliability observed for respondents with lower self-efficacy. The conditional reliability decreased slightly for respondents who had high levels of self-efficacy but, in general, it remained optimal (i.e. reliability >0.70) at all levels of self-efficacy. The overall person separation reliability [43] was 0.98 for the self-efficacy scale.
|
|
| Discussion |
|---|
|
|
|---|
The purpose of this paper was to assess the psychometric properties of a stage-specific self-efficacy scale with IRM and to contrast these findings with CTT item analysis and CFA. The CTT results indicated that the stage-specific self-efficacy scale had high internal consistency and that all items discriminated well. The factor analysis results confirmed the factor structure of the scale, although some weakness in the original hypothesized structure was identified. The IRM item analysis confirmed that the scale had high reliability and a strong dimension but found that (i) one item appeared to measure a different construct (the same one that had a low discrimination in CTT and the lowest factor loading in CFA), (ii) the scale was misaligned with the respondents' distribution and was found to lack items that specifically targeted high self-efficacy, (iii) the location of the items on the self-efficacy continuum did not support the stage-specific assumption and (iv) the five-point response format did not appear to be appropriate for most items. In general, the IRM analysis provided an in-depth evaluation of the psychometric properties of the self-efficacy scale but it provided a complementary assessment to the CTT analysis and CFA.
In general, the self-efficacy scale had a strong dimension. The item that assessed confidence in finding a partner to be physically active with you was the only item found to measure another construct. Although the CTT item analysis and CFA identified this item as weaker (i.e. low discrimination and weak factor loading), these methods did not clearly identify the item as measuring another construct. This issue became apparent when IRM was used to assess the dimensionality of the scale. In evaluating the content of the scale, it appeared that this item asked about self-efficacy in overcoming a barrier not totally within the individual's control. It may not be easy to find a physical activity partner within one's social network. Moreover, finding other people willing to be active with an individual does not mean that it will be easy for the individuals to coordinate their activity schedules, given the individuals' different levels of motivation and scheduling conflicts. This item appeared to measure self-efficacy in maintaining a physical activity social network; however, because only one item measured this construct, it provided a weak assessment of the construct. To remedy this situation, this item should be eliminated from the scale as it does not assess the same construct as the other items. This dimension may be important to measure, but it is not the focus of the scale.
As shown in Table III, four pairs of items were found to have correlated error terms. The first correlated pair addressed the ability to remain active when family responsibilities are more demanding and when work responsibilities are more demanding. In this population of minority women, it may be more appropriate to collapse these items to ask about self-efficacy in maintaining an active lifestyle when work and family responsibilities are more demanding. Although the two items referred to different content areas, the high residual correlated error term suggested that women may not have differentiated between work and family responsibilities because household chores can be perceived as both [44]. The second correlated pair was find time in busy schedule and rearrange schedule to be active. Given the overlapping content of these items, eliminating one would not affect the content representation of the scale, which should be considered in future scale administrations. The third correlated pair was find a safe place and find a convenient place. Although these items addressed two separate issues, it appeared that the women could not separate these issues. Safety is a barrier [45] that often is mentioned among women; therefore, finding a convenient place also means that the location must be safe. It typically is not recommended that a question include two statements [33], but in this case, convenience and safety cannot be dissociated in answering the question; therefore, these two questions should be combined. The last correlated item pair was find a program you enjoy and find a convenient place. It appeared that, when women look for a convenient place to exercise or be physically active, they also look for a place that is convenient and that offers activities they would enjoy. Although enjoyment has been associated strongly with physical activity participation [46] and people probably select activities they enjoy, finding one or several activities that a person would enjoy for a long time may be difficult, given that people may become bored with their activities. Although both issues (i.e. convenience and enjoyment) are important to consider when selecting a location to be active, it may be best to keep the items separate because they address distinct content areas that do not necessarily overlap. Alternatively, it is possible that all correlated error terms have resulted since the structure of these items is similar and the items are administered adjacent to one another. This may have increased the likelihood of responding similarly to these items and provides an alternative explanation to our findings. Overall, the content evaluation suggested that one item be eliminated and two items be combined as well as modifying the order in which the items are administered to see of this would decrease the correlated error terms.
Evaluation of the item locations from the IRM analysis revealed that the items targeted lower self-efficacy but most importantly the location of the items were not differentiated by stages. Specifically, the item locations for the contemplation sub-scale were not all lower than the preparation and action/maintenance sub-scales as well as the item locations for the preparation sub-scale were not all lower than the action/maintenance sub-scale. The locations of the relapse sub-scale were assumed to have a similar location as the contemplation and preparation sub-scales, which appeared to be supported by the data. Although some differentiation by stages was found, overall there was too much overlap in the item locations to support the stage-specific hypothesis. IRM was the only procedure that was able to evaluate this underlying assumption and its rejection suggested that the scale may have limited utility in practice. Although the CFA results supported sub-scale scores, it is possible that the having different stems for the item sets may have increased the correlation and reliability among the items in each set. This may have improved the fit of the CFA and the Cronbach's alpha; however, the validity of the sub-scale scores did not appear to be supported by the IRM analysis which serves to highlight the complementary information that is provided by the IRM analysis.
Another finding of the IRM evaluation was that few items measured high levels of self-efficacy, even though the scale had high internal consistency. Evaluating this information is useful in developing scales that are sensitive to change, which is necessary to assess the impact of physical activity interventions. Determining the levels of self-efficacy, these items target can be used both to develop scales that are sensitive to change and to refine an intervention by gaining a better understanding of which items require more self-efficacy than others. Note that reduced content representation (such as having few items measuring high self-efficacy) decreased the reliability and increased measurement errors for those who have high self-efficacy, but the reliability remained adequate (>0.70). This occurred probably because many of the lower self-efficacy items also measured some level of high self-efficacy. Although maintaining an adequate reliability is important in developing a scale that is sensitive to change, it is equally important to decrease measurement errors and to include items that address all levels of self-efficacy. Therefore, even though the scale had adequate reliability, its content should be modified to provide adequate representation at the top end of the construct and to decrease measurement errors, especially for high levels of self-efficacy.
IRM was instrumental in evaluating the five-point response format. In general, many items were found to function more like a four-point response format because the response option not at all confident was rarely selected even by those with low self-efficacy. This option may not have been perceived as socially appropriate, or it may have been perceived as containing language too strong to reflect their level of confidence and thus was never selected. Although the women may have lacked confidence in overcoming some of the barriers listed, they did not label themselves as not confident at all. Thus, it appears that although a five-point response scale was administered, the respondents used it mainly as a four-point scale by ignoring one of the anchors. In future revisions, it may be important to find another label for not confident at all to avoid eliciting an adverse reaction from respondents as was observed in this paper.
A major finding of this paper is that a comprehensive assessment was needed to uncover some of the strengths and weaknesses associated with the self-efficacy scale. For example, the results of the CTT item analysis and the CFA did not reveal that the scale may be less sensitive with high levels of self-efficacy which in turn may impact the ability to detect change over time and that the five-point response option was not well targeted. IRM was the only method that rejected the stage-specific properties of the scale. In contrast, the CFA identified issues that were not clearly uncovered by the other approaches, overlap in item content as identified by the correlated error terms. The paper also identified ways to improve the scale so that it might better assess stage-specific self-efficacy. It should be noted that omitting the pre-contemplation stage and combining the action and maintenance stages do not totally adhere to the stages of change [17] and future endeavor to develop a stage-specific self-efficacy scale should consider including all stages into the development. The purpose of this paper was to highlight how these methods complement each other and to emphasize that a full psychometric evaluation can be better served by utilizing all these methods. The methods used to assess the psychometric properties of the self-efficacy scale focused solely on determining the structural properties and internal consistency of the scale. As indicated in Benson [47], fully assessing the properties of the scale also requires evaluating the substantive and external validity and this paper focused on one aspect of construct validity.
| Conflict of interest statement |
|---|
|
|
|---|
None declared.
| Acknowledgements |
|---|
|
|
|---|
This work was funded by the Women's Health Initiative through the CDC, CDC U48/CC609653. This work does not necessarily reflect the views or policies of the funding agencies.
| References |
|---|
|
|
|---|
1. Bandura A. Social Foundations of Thought and Action: A Social Cognitive Theory.Englewood Cliffs, NJ: Prentice-Hall 1986.
2. Prochaska JO and DiClemente CC. Stages and processes of self-change of smoking: toward an integrative model of change. J Consult Clin Psychol 1983 51:3905.[CrossRef][ISI][Medline]
3. The health belief model and personal health behavior. In Becker MH (Ed.). Health Educ Monogr.1974 2: pp. 1508.[Medline]
4. Ajzen I. The theory of planned behavior. Organ Behav Hum Decis Process 1991 50:179211.[CrossRef][ISI]
5. Trost SG, Owen N, Bauman AE, et al. Correlates of adults' participation in physical activity: review and update. Med Sci Sports Exerc 2002 34:19962001.
6. Poag-DuCharme KA and Brawley LR. Self-efficacy theory: use in the prediction of exercise behavior in the community setting. J Appl Sport Psychol 1993 5:17894.
7. Sallis JF, Hovell MF, Hofstetter CR. Predictors of adoption and maintenance of vigorous physical activity in men and women. Prev Med 1992 21:23751.[CrossRef][ISI][Medline]
8. Sallis JF, Hovell MF, Hofstetter CR, et al. Explanation of vigorous physical activity during two years using social learning variables. Soc Sci Med 1992 34:2532.[CrossRef][ISI][Medline]
9. McAuley E. Self-efficacy and the maintenance of exercise participation in older adults. J Behav Med 1993 16:10313.[CrossRef][ISI][Medline]
10. Marcus BH, Eaton CA, Rossi JS, et al. Self-efficacy, decision-making, and stages of change: an integrative model of physical exercise. J Appl Soc Psychol 1994 24:489508.[CrossRef]
11. Muto T, Saito T, Sakurai H. Factors associated with male workers' participation in regular physical activity. Ind Health 1996 34:30721.[ISI][Medline]
12. McAuley E and Jacobson L. Self-efficacy and exercise participation in sedentary adult females. Am J Health Promot 1991 5:18591.[Medline]
13. Marshall SJ and Biddle SJH. The transtheoretical model of behavior change: a meta-analysis of applications to physical activity and exercise. Ann Behav Med 2001 23:22946.[CrossRef][ISI][Medline]
14. Miller YD, Trost SG, Brown WJ. Mediators of physical activity behavior change among women with young children. Am J Prev Med 2002 23:98103.[ISI][Medline]
15. Wilbur J, Miller AM, Chandler P, et al. Determinants of physical activity and adherence to a 24-week home-based walking program in African American and Caucasian women. Res Nurs Health 2003 26:21324.[CrossRef][ISI][Medline]
16. Laffrey SC and Asawachaisuwikrom W. Development of an exercise self-efficacy questionnaire for older Mexican American women. J Nurs Meas 2001 9:25973.[Medline]
17. Marcus BH, Selby VC, Niaura RS, et al. Self-efficacy and the stages of exercise behavior change. Res Q Exerc Sport 1992 63:606.[ISI][Medline]
18. Resnick B and Jenkins LS. Testing the reliability and validity of the self-efficacy for exercise scale. Nurs Res 2000 49:1549.[CrossRef][ISI][Medline]
19. Sallis JF, Pinski RB, Grossman RM, et al. The development of self-efficacy scales for health-related diet and exercise behaviors. Health Educ Res 1988 3:28392.
20. Spray JA. Recent development in measurement and possible applications to the measurement of psychomotor behavior. Res Q Exerc Sport 1987 67:36372.
21. Mâsse LC, Dassa C, Gauvin L, et al. Emerging measurement and statistical methods in physical activity research. Am J Prev Med 2002 23:Suppl. 2, 4455.[CrossRef][ISI][Medline]
22. Bond TG and Fox CM. Applying the Rasch Model: Fundamental Measurement in the Human Sciences.Mahwah, NJ: Erlbaum 2001.
23. Reeve RB and Mâsse LC. Applications of item response modeling (IRM) modeling for questionnaire evaluation. In Presser S, Rothgeb JM, Couper MP (Eds.). Methods for Testing and Evaluating Survey Questionnaires.Hoboken, NJ: John Wiley & Sons 2004 pp. 24773.
24. Embretson SE and Reise SP. Item Response Modeling for Psychologists.Mahwah, NJ: Erlbaum 2000.
25. Hays RD, Morales LS, Reise SP. Item response modeling and health outcomes measurement in the 21st century. Med Care 2000 38:Suppl. 2, 2841.
26. Safrit MJ, Cohen AS, Costa MG. Item response modeling and the measurement of motor behavior. Res Q Exerc Sport 1989 60:32535.[ISI][Medline]
27. Safrit MJ, Zhu W, Costa MG, et al. The difficulty of sit-ups tests: an empirical investigation. Res Q Exerc Sport 1992 63:27783.[ISI][Medline]
28. Spray JA. One-parameter item response modeling models for psychomotor tests involving repeated, independent attempts. Res Q Exer Sport 1990 61:1628.
29. Zhu W and Cole EL. Many-faceted Rasch calibration of a gross motor instrument. Res Q Exerc Sport 1996 67:2434.[ISI][Medline]
30. Zhu W and Safrit MJ. The calibration of a sit-up task using the Rasch Poisson counts model. Can J Appl Physiol 1993 18:20719.[ISI][Medline]
31. Escobar-Chaves SL, Tortolero SR, Mâsse LC, et al. Recruiting and retaining minority women: findings from the Women On The Move study. Ethn Dis 2002 12:24251.[Medline]
32. Mâsse LC and Anderson CB. Effect of ethnicity, education, and income on mediators of physical activity in women. Am J Health Promot 2003 17:35760.[ISI][Medline]
33. Nunnally JC and Bernstein IH. Psychometric Theory. 3rd edn New York, NY: McGraw-Hill 1994.
34. Software Scientific International. Lisrel (Version 8.30).Lincolnwood, IL: Scientific Software International, Inc 1999 [Computer software].
35. Hu L and Bentler PM. Evaluating model fit. In Hoyle RH (Ed.). Structural Equation Modeling: Concepts Issues, and Applications.Thousand Oaks, CA: Sage Publication 1995 pp. 7699.
36. Wu ML, Adams RJ, Wilson MR. Acer Conquest: Generalized Item Response Modelling Software.Melbourne, Australia: ACER 1998.
37. Masters GN. A Rasch model for partial credit scoring. Psychometrika 1982 49:35981.[Medline]
38. Andrich DA. A rating formulation for ordered response categories. Psychometrika 1978 43:56173.[CrossRef][ISI]
39. Adams RJ and Khoo S. Quest.Melbourne, Australia: ACER 1996.
40. Reckase M. Unifactor latent trait models applied to multifactor tests: results and implications. J Educ Stat 1979 4:20730.[CrossRef]
41. Wilson M. Constructing Measures: An Item Response Modeling Approach.Mahwah, NJ: Erlbaum 2005.
42. Zhu W, Timm G, Ainsworth B. Rasch calibration and optimal categorization of an instrument measuring women's exercise perseverance and barriers. Res Q Exerc Sport 2001 72:10416.[ISI][Medline]
43. Wright BD and Masters GN. Rating Scale Analysis.Chicago, IL: MESA Press 1982.
44. Henderson KA and Ainsworth BE. A synthesis of perceptions about physical activity among older African American and American Indian women. Am J Public Health 2003 93:3137.
45. Wilcox S, Richter DL, Henderson KA, et al. Perceptions of physical activity and personal barriers and enablers in African-American women. Ethn Dis 2002 12:35362.[Medline]
46. Sallis JF and Owen N. Physical Activity & Behavioral Medicine.Thousand Oaks, CA: Sage Publications 1999.
47. Benson J. Developing a strong program of construct validation: a test anxiety example. Educ Meas Issues Pract 1998 17:1022.[CrossRef]
Received on March 28, 2006; accepted on August 18, 2006
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
L. C. Masse, M. Wilson, T. Baranowski, and L. Nebeling Improving psychometric methods in health education and health behavior research. Health Educ. Res., December 1, 2006; 21(suppl_1): i1 - i3. [Full Text] [PDF] |
||||
![]() |
A. L. Dunn, K. Resnicow, and L. M. Klesges Improving measurement methods for behavior change interventions: opportunities for innovation Health Educ. Res., December 1, 2006; 21(suppl_1): i121 - i124. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






