Health Education Research Advance Access originally published online on October 31, 2006
Health Education Research 2006 21(Supplement 1):i121-i124; doi:10.1093/her/cyl141
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Author(s).
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Improving measurement methods for behavior change interventions: opportunities for innovation
1 Klein Buendel, Inc., 1667 Cole Boulevard, Suite 225, Golden, CO 80401, USA
2 Health Behavior and Health Education School of Public Health, University of Michigan, Ann Arbor, MI, USA
3 Department of Epidemiology and Cancer Control, St Jude Children's Research Hospital, Memphis, TN, USA
* Correspondence to: A. L. Dunn. E-mail: adunn{at}kleinbuendel.com
Theoretically based behavioral interventions have demonstrated effectiveness in stopping smoking, increasing physical activity and improving nutrition in order to prevent major chronic diseases such as cardiovascular disease, cancer and diabetes [17]. Despite the development of effective behavior change programs, the magnitude of change in these behaviors has been relatively modest [8, 9]. The rising rates of diabetes and obesity, for example, indicate there is still much to learn about the mechanisms of behavior change and how to maintain newly acquired behavioral skills. One problem that has slowed behavioral intervention research is that the validity and reliability of our measures has sometimes lagged other innovations such as the development of effective tailored interventions [10, 11] or analytical techniques for assessing moderators and mediators of behavior change, hence making it difficult to understand the mechanisms of behavior change and making it difficult to improve our interventions [1215].
This special issue of Health Education Research provides an opportunity to consider an important advancement in our behavioral measurement methods, specifically, how we can apply item response models to improve our psychometric methods in health education and health behavior research and practice. Although item response modeling (IRM) has been used in educational testing over the last three decades [16], it is an emerging method in health education and health behavior research [17]. As, for example, in health research, IRM is being widely adopted to improve and revise quality of life questionnaires [1820]. There are many other innovative applications of IRM and as the nine papers in this issue show us, these applications can aid our understanding of the psychometric properties of scales beyond making questionnaires shorter and beyond what most of us learned in our survey development courses based on classical test theory (CTT). In this brief afterword, we highlight current and future applications of IRM and discuss how these methods might help us improve the efficiency of our research. We believe these methods could lead to considerable improvements in intervention methods, understanding the mechanisms of behavior change and developing and refining theoretical models to make them more parsimonious as well as provide a foundation for considering the feasibility of computerized adaptive testing for measures of behavioral constructs.
For those who are not familiar with IRM methods, the two papers by Wilson colleagues [21, 22] provide an excellent tutorial by presenting an example using the Rasch one-parameter model to examine a measure of self-efficacy using both dichotomous and polytomous models and then comparing IRM and CTT methods. In the first paper, readers learn the basics of IRM, including the relationship of individual items as well as the role of item difficulty response patterns. In other words, item response models estimate a result of the underlying construct for the individual and a difficulty estimate of the item on the underlying construct. The implications of these features of IRM are likely to be far reaching for health promotion research and practice. Researchers will be able to determine whether items contribute to an overall understanding of a construct and determine if there are gaps in item difficulty which could enable a more informed decision about development of new items. We see the beginnings of work that builds toward precise measures that yield equal accuracy at all levels [23]. For example, in the quality of life measures, the goal is to have equally precise measures across the continuum of chronically ill to healthy individuals. A similar goal could be established for behavioral constructs like self-efficacy, barriers, benefits, enjoyment and many others. The goal is to increase the reliability of test scores and to improve the amount of information obtained along the continuum of the construct measured by the scale and potentially improving the scale's ability to detect behavior change.
Another potential application of IRM is in the design of tailored interventions. Behavioral scientists have recognized the promise of tailored interventions aimed at discriminating key motivational strategies for individuals but we have lacked methods to effectively differentiate individuals on constructs targeted by these strategies [11, 24, 25]. For example, it is unlikely that current measures of self-efficacy for physical activity clearly differentiate levels among individuals with either high or low self-efficacy. For example, in the case of two people who might score 90 out of 100 on a self-efficacy scale for physical activity, we might want to tailor the self-efficacy advice somewhat differently depending on how they achieved that level of self-efficacy, e.g did confidence increase because of a change of circumstances or because they have been able to work through problems over a long period of time and have developed a lifelong habit. Tailoring interventions is not a new concept and it can be done with CTT, but one of the greatest advantages of IRM is that both the items and total scores are on the same metric and the properties of the items are sample invariant when the IRM fits the data. Having the items and total scores on the same metric means that we know what items a person with low self-efficacy is likely to endorse which strengthen our ability to develop tailored interventions. Therefore, by applying IRM methods, we can build databases to better understand the properties of individual items and the location of these items along a difficulty or frequency of endorsement continuum. Improved precision of items can lead to improved tailoring of interventions. Furthermore, this process can serve as the foundation to develop item banks that could be a shared resource among researchers and is instrumental for the development of computerized adaptive testing.
The assembly of item banks is an important first step in computerized adaptive testing which has become the norm in educational testing such as the national nursing licensing examinations and Graduate Record Examination. To illustrate, items on each of these exams are ranked by difficulty. Individuals who answer easy items correctly move to next levels of difficulty and do not waste time on easier questions. The outcome is a reduction in testing time and increased precision (higher reliability and lower standard error of measurement). Computerized adaptive testing is under development in the quality of life assessment and could also be used for the assessment of many constructs used in health promotion research and practice. This long-term goal would require an iterative process of developing an initial item bank of the different theoretical constructs to be measured. Gaps in the item bank would then be determined using IRM methods and new items developed where needed. IRM could then be used again to determine difficulty ratings of banked items in order to be able to begin to develop more efficient construct measures [23]. For example, individuals who show high levels of efficacy may be administered only items for the more difficult end of the efficacy distribution. A key issue here is whether behavioral health represents skills or abilities that map onto concepts of difficulty and whether the field is able to move forward in exploring innovations in measurement methodology.
The papers by Heesch, Mâsse and Dunn [26] Watson, Baranowski and Thompson [27] provide examples of the IRM method applied to several different scales and demonstrate how we might use IRM to evaluate the psychometric properties of our measures. For example, in the paper by Heesch, three scales are evaluated, namely, physical activity enjoyment [28], barriers to physical activity and benefits of physical activity [29]. The Rasch analyses of these scales demonstrated the unidimensionality of the enjoyment and benefits scales but the barriers scale had two separate factors that seem to be related to time demands and barriers internal to the person. Further, the analyses indicated problematic items within the scales that seemed to tap other minor dimensions. These items could be dropped in a future scale revision. While this finding could have also been determined with CTT, IRM analyses also revealed that the perceived benefits scale was limited in its ability to measure the underlying construct in a large sample of sedentary adults because items were not providing adequate content coverage for those having moderately high to high levels of perceived benefits. The enjoyment scale also did not indicate good item content coverage for those on the low or high end of physical activity enjoyment. As all of these papers have shown, IRM provides detailed information about the performance of the items by identifying difficulty gaps, which provides an important first step toward improving testing in health promotion research and practice by increasing our understanding of the performance of particular items. In addition to identifying difficulty gaps, IRM also helps to identify response gaps. For example, the Rasch model not only provides information about whether the distances between responses are equal or not (using the rating scale versus partial credit models) but also tells us about the functioning of the response format using item characteristic curves. By having this information in addition to what CTT provides, we can greatly enhance the efficiencies and understanding of our measurements. This lesson is made even clearer in the paper by Mâsse, Heesch and Eason [30] comparing confirmatory factor analyses (CFA) with CTT and IRM. Although all of these methods provide useful information on psychometric properties of scales, IRM provides additional but complementary information that cannot be provided by CFA or CTT. IRM extends our capabilities but does not replace the need to consider CFA or CTT approaches.
IRM also provides a method for enhancing our understanding of how various scales purporting to measure comparable constructs might be similar or different. Currently, we have difficulty comparing intervention outcomes (or surveys for that matter) that have administered different scales. Methods of scale linkage are highly desirable when trying to evaluate and contrast the results of several studies. The paper by Mâsse, Allen and Wilson [31] demonstrates how two eight-item self-regulatory scales can be linked and found that this is possible if there were at least four items in common between the measures. Behavioral science researchers often use different scales of the same construct and frequently items within the scale are similar. The IRM linking method is likely to be especially useful in meta-analytic reviews of studies to be able to evaluate the comparability of results as additional scale linkage papers become available. This is relevant to adaptive testing, as this linking could allow us to analyze variables measured with different versions of instruments.
We are enthusiastic about the potential of IRM to advance the behavioral science measurement field. The implications of IRM analyses hold exciting potential to improve our interventions, our understanding of the mechanisms of behavior change and our evaluation of the theoretical basis for behavior change research. We are grateful to the leadership of Drs. Mâsse, Baranowski and Wilson for providing a stimulating set of papers and we urge all who are interested in furthering the field of behavioral science to consider IRM to advance our knowledge of the psychometric properties of our measurements. In the near future, there are many existing data sets that could be used to continue to build our knowledge base and we would urge the use of complementary approaches that employ CTT, CFA and IRM methods to gain a more complete picture of the performance of these scales in different populations. In the future, it will be possible to develop item banks and to develop computerized adaptive tests for research. Such an effort will require leadership to build the necessary infrastructure for such an undertaking. Given that behavior underlies much of the chronic disease in the world, it is easy to understand the necessity of reaching these intermediate and distant goals.
| References |
|---|
|
|
|---|
1. Lancaster T and Stead LF. Individual behavioural counselling for smoking cessation. Cochrane Database Syst Rev 2005 CD001292.
2. Stead LF and Lancaster T. Group behaviour therapy programmes for smoking cessation. Cochrane Database Syst Rev 2005 CD001007.
3. Gotay CC. Behavior and cancer prevention. J Clin Oncol 2005 23:30110.
4. Dzewaltowski DA, Estabrooks PA, Klesges LM, et al. Behavior change intervention research in community settings: how generalizable are the results? Health Promot Int 2004 19:23545.
5. Katz DL, O'Connell M, Yeh MC, et al. Public health strategies for preventing and controlling overweight and obesity in school and worksite settings: a report on recommendations of the Task Force on Community Preventive Services. MMWR Recomm Rep 2005 54:112.[Medline]
6. Hardeman W, Griffin S, Johnston M, et al. Interventions to prevent weight gain: a systematic review of psychological models and behaviour change methods. Int J Obes Relat Metab Disord 2000 24:13143.[CrossRef][Web of Science][Medline]
7. Avenell A, Broom J, Brown TJ, et al. Systematic review of the long-term effects and economic consequences of treatments for obesity and implications for health improvement. Health Technol Assess 2004 8:21.
8. Rothman AJ. "Is there nothing more practical than a good theory?": why innovations and advances in health behavior change will arise if interventions are used to test and refine theory. Int J Behav Nutr Phys Act 2004 1:11.[CrossRef][Medline]
9. Jeffery RW. How can health behavior theory be made more useful for intervention research? Int J Behav Nutr Phys Act 2004 1:10.[CrossRef][Medline]
10. Kroeze W, Werkman A, Brug J. A systematic review of randomized trials on the effectiveness of computer-tailored education on physical activity and dietary behaviors. Ann Behav Med 2006 31:20523.[CrossRef][Web of Science][Medline]
11. Strecher V, Wang C, Derry H, et al. Tailored interventions for multiple risk behaviors. Health Educ Res 2002 17:61926.
12. MacKinnon DP, Lockwood CM, Hoffman JM, et al. A comparison of methods to test mediation and other intervening variable effects. Psychol Methods 2002 7:83104.[CrossRef][Web of Science][Medline]
13. MacKinnon D and Dwyer JH. Estimating mediated effects in prevention studies. Eval Rev 1993 17:14458.
14. Bauman AE, Sallis JF, Dzewaltowski DA, et al. Toward a better understanding of the influences on physical activity: the role of determinants, correlates, causal variables, mediators, moderators, and confounders. Am J Prev Med 2002 23:514.[CrossRef][Web of Science][Medline]
15. Kraemer HC, Wilson GT, Fairburn CG, et al. Mediators and moderators of treatment effects in randomized clinical trials. Arch Gen Psychiatry 2002 59:87783.
16. Lord FM. Applications of Item Response Theory to Practical Testing Problems.Conference Proceedings. Mahwah, NJ: Lawrence Erlbaum Associates, Inc 1980.
17. Mâsse LC, Dassa C, Gauvin L, et al. Emerging measurement and statistical methods in physical activity research. Am J Prev Med 2002 23:4455.[CrossRef][Web of Science][Medline]
18. Petersen MA, Groenvold M, Aaronson N, et al. Item response theory was used to shorten EORTC QLQ-C30 scales for use in palliative care. J Clin Epidemiol 2006 59:3644.[CrossRef][Web of Science][Medline]
19. Petersen MA, Groenvold M, Aaronson N. Scoring based on item response theory did not alter the measurement ability of EORTC QLQ-C30 scales. J Clin Epidemiol 2005 58:9028.[CrossRef][Web of Science][Medline]
20. Prieto L, Alonso J, Lamarca R. Classical test theory versus Rasch analysis for quality of life questionnaire reduction. Health Qual Life Outcomes 2003 1:27.[CrossRef][Medline]
21. Wilson M, Allen DD, Li JC. Improving measurement in health education and health behavior research using item response modeling: comparison with the classical test theory approach. Health Educ Res 2006 21:Suppl 1, i19i32.
22. Wilson M, Allen DD, Li JC. Improving measurement in health education and health behavior research using item response modeling: introducing item response modeling. Health Educ Res 2006 21:Suppl 1, i4i18.
23. McHorney CA. Generic health measurement: past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med 1997 127:74350.
24. Oenema A. Web-based tailored nutrition education: results of a randomized controlled trial. Health Educ Res 2002 16:64760.
25. Marcus BH, Bock BC, Pinto BM, et al. Efficacy of an individualized, motivationally-tailored physical activity intervention. Ann Behav Med 1998 20:17480.[Web of Science][Medline]
26. Heesch KC, Mâsse LC, Dunn AL. Using Rasch modeling to re-evaluate three scales related to physical activity: enjoyment, perceived benefits and perceived barriers. Health Educ Res 2006 21:Suppl 1, i58i72.
27. Watson K, Baranowski T, Thompson D. Item response modeling: an evaluation of the children's fruit and vegetable self-efficacy questionnaire. Health Educ Res 2006 21:Suppl 1, i47i57.
28. Kendzierski D and DeCarlo KJ. Physical activity enjoyment scale: two validation studies. J Sport Exerc Psychol 1999 13:5064.
29. Sallis JF, Hovell MF, Hofstetter CR, et al. A multivariate study of determinants of vigorous exercise in a community sample. Prev Med.1989 18: pp. 2034.[CrossRef][Web of Science][Medline]
30. Mâsse LC, Heesch KC, Eason KE, et al. Evaluating the properties of a stage-specific self-efficacy scale for physical activity using classical test theory, confirmatory factor analysis, and item response modeling. Health Educ Res 2006 21:Suppl 1, i33i46.
31. Mâsse LC, Allen D, Wilson M, et al. Introducing equating methodologies to compare test scores from two different self-regulation scales. Health Educ Res 2006 21:Suppl 1, i110i120.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||